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Introduction 

Endocrine therapy is often the least toxie and most effeetive treatment for hormone reeeptor positive invasive 
breast cancer. Sueh therapy includes antiestrogens (tamoxifen, fulvestrant) and aromatase inhibitors (AI; e.g., 
anastrozole, letrozole, exemestane). Tamoxifen (TAM) inereases disease free and overall survival in the adjuvant 
setting, reduees the incidenee of estrogen reeeptor positive disease (ER+; unless otherwise noted ER=ERa) in 
high-risk women, and reduces the rate of bone loss seeondary to osteoporosis in postmenopausal women [1,2], 
Aromatase inhibitors are effeetive only in the absence of functioning ovaries - TAM can be used regardless of 
menopausal status. Recent studies suggest that AIs may be superior to TAM in the adjuvant treatment of 
postmenopausal women with ER+ breast eancer; other studies report higher overall response rates with AI V5. 
TAM as first line therapy in the metastatic setting. Thus, a eontroversy in the management of patients with ER+ 
disease is whether an aromatase inhibitor or TAM should be given as first line endoerine therapy [3-9]. 

In this Clinieal Translational Researeh award, we propose to build elassifiers that accurately separate antiestrogen 
responsive from antiestrogen resistant/unresponsive breast tumors and begin to assist in the direction of specific 
endocrine treatments (antiestrogen V5. aromatase inhibitor) to individual patients. We hypothesize that endocrine 
responsiveness is affeeted by a gene network, rather than the aetivity of only one or two genes or signaling 
pathways [10-12]. Since the key components of sueh a network are unknown, we must study 10,000s of genes. 
We will use Affymetrix GeneChips. We will not identify mutational events, the presence of mRNA splice 
variants, or post-translational protein modifications. However, these factors have major effects on the 
transcriptome and their "footprints" should be identified by expression mieroarrays. 


Body 

Overview : We will build classifiers with the ultimate goal of separating antiestrogen sensitive from antiestrogen 
resistant breast tumors and begin to assist in the direetion of specific endocrine treatments (antiestrogen V5. 
aromatase inhibitor) to individual patients. To achieve this goal, and consistent with a CTR award, we will 
eonduct a 4-year, prospeetive, neoadjuvant study with an aromatase inhibitor (AI) or antiestrogen (Tamoxifen; 
TAM) as the only systemie therapy. We will obtain molecular profiles from Affymetrix GeneChips and further 
develop and apply our innovative bioinformatic and biostatistic methods to explore these high dimensional data 
sets and build/validate new classifiers. A more accurate predictor of endocrine responsiveness would have 
widespread clinieal use, allowing women and physicians to make more individualized and appropriate treatment 
decisions. Eor example, patients with tumors predicted to be resistant to antiestrogens and/or aromatase inhibitors 
would be strong candidates for an early intervention with cytotoxic chemotherapy. 

In most predictive/prognostic marker studies investigators focus on a single factor and whether they obtain a p- 
value that reaches eonventional statistical significance. Our approaeh is different beeause we will determine 
whether we can also find joint gene subsets that can separate patients into sufficiently distinct groups that should 
differ in their treatment. We will (1) analyze >33,000 genes on retrospective and prospeetive material, (2) apply 
(and develop/improve) new biostatistical and bioinformatic methods to identify potentially informative 
"biomarkers," (3) build neural network and biostatistical model classifiers, (4) evaluate the joint diseriminant 
power of selected genes coneurrently rather than as single biomarkers, (5) focus on prediction for individual 
patients where the assessment of a p-value is less important than the classification rate of our predictors, (6) 
validate the elassifiers in independent data sets, and (7) explore the ability of predietors to refine the targeting of 
specific endoerine therapies. 

Evidence has begun to accumulate suggesting that an AI might be a more effective first line endoerine therapy for 
some breast eancer patients than TAM. These data have generated considerable interest and controversy, in part 
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because unlike TAM, there are no long term studies with AIs where definitive survival data are available from up 
to 20 years of clinical follow-up. Our study could provide new and innovative insights into how to approach the 
more effective targeting of specific endocrine therapies to individual patients. 


Specific Aims 

We will complete two clinical studies and collect gene expression profiles from which to build predictors of 
endocrine responsiveness. Predictors will be built in Specific Aim 2 and validated in Specific Aim 3. 

Aim 1: Clinical Studies - Clinical Study-l (retrospective) is of pretreatment, single, frozen samples where we 
will compare the molecular profiles of tumors that recurred on TAM with those of tumors that did not recur. 
Clinical Studv-2 is a prospective study of breast tumor samples from patients treated with neoadjuvant TAM or 
AT 

Aim 2: We will develop and apply novel bioinformatics and biostatistics to discover gene subsets that define the 
molecular differences between endocrine sensitive and resistant breast tumors. 

Aim 3: We will test, optimize, and validate the performance of the classifiers from Aim 2 in retrospective studies 
of human breast tumors. 


Key Research Accomplishments 

As noted in previous reports, progress on the clinical goals for this award was greatly delayed because of the time 
taken to obtain DOD approval of our preexisting institutionally approved IRBs at Georgetown University and at 
the University of Edinburgh. All institutionally approved protocols and requested material were submitted to the 
DOD in July 2004; additional information was requested by the DOD several months later and submitted in 
November 2004. We did not receive final approval to proceed with the clinical studies until March 2005. Much of 
this delay seems to have been entirely unavoidable (see prior reports). While this affected progress on some tasks, 
we made significant strides and advanced the research program in each of the three primary tasks. Our 
development of new analytical procedures and the research infrastructure has been largely completed, although 
we will continue to develop these as they have applicability beyond the work in this study and because the 
technologies in the field continue to evolve. We will also continue (beyond the ending of this award) to follow 
patient performance and to update our clinical records for the prospective study and, as appropriate, reanalyze and 
report further progress. Publications supported since the commencement of this no-cost extension are listed under 
“Reportable Outcomes”; these constitute some of our major accomplishments in the past year. Since we are now 
working on publishing the data from these two clinical studies (retrospective; prospective), these and other key 
research accomplishments are presented in more detail below. 


Progress on our Statement of Work (narrative) 

• TASK 1. Array breast tumor samples from Clinical Studies 1 (retrospective) and 2 (prospective) 

We have obtained breast specimens from breast cancer patients treated with endocrine therapy (or not, i.e., 
surgery and radiation only in selected retrospective cases) as described in the original application. These 
specimens represent a mix of the initial prospective and retrospective specimens. All of these specimens have 
been fully analyzed and annotated by the study pathologist and total RNA has been extracted, and either stored, 
labeled and/or arrayed. Thus, we have also completed the assessment of tissue, RNA, and microarray data quality 
control as appropriate. 
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We requested that the speeimens be sent independent of the elinical information, so that we eould adequately and 
appropriately randomize the RNA preparation, labeling, and hybridization and minimize any operator-indueed or 
teehnology-indueed bias. All speeimens were proeessed using our standard operating proeedures; eaeh 
manipulation being performed by the same individual to further redueed inter-operator variability. Details of the 
methods, quality eontrol measures, and general experimental approaehes have been deseribed in detail in earlier 
annual reports. 

We have found the data from the two olinieal studies in this CTR to be partieularly useful in supporting other 
studies that are ongoing in the laboratory. For example, these data have been used to support ROl applieations on 
genes we identified and deseribed in the preliminary data for this application. One of these, which is focused on 
components of a gene network associated with endocrine resistance in breast cancer, was supported, in part, by 
preliminary data obtained by analysis of the data generated specifically in this CTR application (R01-CA131465; 
awarded in June 2009). Indeed, we have found the unique datasets from this CTR to be particularly powerful for 
exploring the potential of individual genes to act as biomarkers of endocrine resistance (in addition to the neural 
network classifiers described later in this report). These genes can be obtained from several sources including as 
individual genes selected by analysis of the data in the retrospective and/or prospective clinical studies, and/or 
from other work in our laboratory. 

As in prior years, we have also used these data to provide preliminary data on gene expression values that have 
led to our colleagues initiating other studies directed at developing therapeutic strategies to target individual 
genes we have identified from within this data set or from other sources. One of the genes described in our initial 
preliminary data and studied further as an individual gene in the data from this CTR has led directly to the very 
recent identification of a potential new breast cancer drug. This biomarker gene (XBPl) was discussed in the 
original application and in subsequent annual reports, and we have now identified a small molecule inhibitor of its 
activation that seems to have activity against breast cancer cells. While this molecule was developed primarily 
using institutional funds (and so it is not discussed further in this report), it represents a definitive step towards 
addressing one of the longer term goals of this award. 

While, as noted in prior reports, we had prioritized the analysis of data from the retrospective study material 
because these have definitive clinical outcomes (survival), as noted for Tasks 2/3 (below), we have also analyzed 
the material from our prospective study. We will continue to collect information on recurrence status of the 
patients accrued to the prospective study because we expect informative events that will strengthen our study to 
continue to occur. Since many endocrine treated breast cancers tend to recur in later years, it is to be expected that 
we may obtain our most valuable data (recurrence) after this award has ended. While we report our initial 
analyses of the prospective study (see Tasks 2/3), we expect to conduct further analysis in the future. We are fully 
committed to generating the most complete analysis we can perform, and we believe that we have made very 
strong progress in this task. We will continue to report the outcomes of future research on these data sets in the 
literature, and to submit these publications to the DOD. We will bear the costs of these future analyses and 
clinical follow up studies from other sources as appropriate. 

We have completed almost all of the work on this Task as anticipated in prior reports. We have now also 
completed a major analysis of the retrospective study (Clinical Study-1) described in the approved report 
submitted last year (2008) - see Tasks 2/3 below. For the prospective study (Clinical Study-2), we will continue to 
obtain outcomes data and to update the database and research analyses as the clinical information dictates. 
Analyses are also provided below (Tasks 2/3). 


• TASK 2, Store, process, and train/optimize classifiers from gene expression microarray data (already 
modified to reflect our adoption of caArray and other caBIG tools in prior reports) 
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We have made significant progress on addressing the infrastructure goals in this task, largely as a consequence of 
our involvement in the National Cancer Institute Center caBIG project (as also noted in prior reports). The PI (Dr. 
Clarke) also leads the Lombardi Comprehensive Cancer Center’s (LCCC) caBIG team and we have been actively 
involved in the development of caArray (caBIG’s grid-enabled, MIAME compliant, microarray database). Our 
work in the development of the informatics infrastructure that currently houses the clinical and microarray data 
generated in this CTR, and the analytical tools we have developed in part with support from this CTR (as 
proposed in the original application) have been particularly productive. Most notable, our work in this area has 
led to the successful award of a caBIG In Silico Center of Excellence (RFP CA-S09-094; July 2009-2012; Dr. 
Clarke is the Pl/Scientific Director; Dr. Madhavan is the other co-PI and the Project Manager): a subcontract to 
our colleagues at Virginia Tech is also included (same laboratories as were supported in this CTR). This new 
award will allow us to continue to advance and improve the clinical-translational research infrastructure 
development that we described in our prior reports. The research infrastructure will be further developed to 
support cancer research within the LCCC and shared fully with the broader research community. Our adoption of 
caBIG and our activities in this large federally funded research community also allows us to share our expertise 
and tools with many other cancer researchers. 

We also continue to further develop and optimize our data analysis algorithms, with particular success in the 
design of new approaches for network analysis. We have found approaching this goal to be realistic in a much 
shorter time-frame than initially expected and have now published several relevant articles. We also continue to 
improve our existing algorithms for data analysis. Relevant publications in this area are also included below in the 
section “Reportable Outcomes.” 

We have now acquired the data for analysis of the endocrine therapies and outcomes. We have gone substantially 
beyond the initial analysis reported last year. We now report optimized and validated classifiers of TAM 
responsiveness (Clinical Study-1). Please note that we include initial validation studies (Task 3) here also for the 
sake of brevity and clarity. We also provide analysis of the data from the prospective study (Clinical Study-2). 
The data analysis format follows that reported last year; it is shown again here to allow reviewers to follow the 
procedures. 


Support Vector Machine with Recursive Feature Elimination 



The data analysis design is shown above (LOOCV = leave-one-out cross validation). 

From this scheme, testing and optimization of the classifier is performed using 10-fold crossvalidation within the 
BC030280 data set for the analysis of Clinical Study-1. 
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We have determined that the existing published data sets we used previously are somewhat useful (some of these 
eoncerns were indieated in last year’s report) but we eould not use at least two of these to validate our endoerine 
responsiveness classifiers (Task 3). The only data set available for such validation has been published by Loi et 
al. [13]. Unfortunately, the dataset they used was created by combining data from multiple centers/studies. To 
address our concerns with the Loi dataset and to provide a more rigorous validation dataset, we selected a subset 
of their data, where the patients, their treatments, outcomes, and follow-up more closely match those in the 
BC030280 data set. Thus, we now have our single institution dataset (BC030280 data set from the retrospective 
clinical study is probably still the only such dataset with meaningful follow-up), and an adequate validation data 
set (validation is a key aspect of Task 3). 

In our analyses, the datasets now enable us to ask, for the first time, important and previously unanswered 
questions in clinical endocrine resistance. In this report, we describe, for the first time, an innovative analysis in 
which we separate early recurrent disease from late recurrent disease (other analyses will continue as we test 
other hypotheses in these data sets). The hypothesis being tested is that early recurrences are likely to be simply 
reflective of poor prognosis, whereas later recurrences have a different biology that is more reflective of true 
endocrine resistance. If we cannot accurately separate these two groups, then the underlying biology of early vs. 
late recurrent disease is probably the same. However, either outcome is almost equally important and informative, 
and of great significance both for the direction of therapy and understanding for understanding the biology of 
endocrine sensitive and resistant breast cancer. For example, patients that will recur early are unlikely to receive 
major benefit from TAM therapy. Those at risk of recurring later, may require more intensive follow up for a 
prolonged period. Understanding the biology driving the late recurrences could also lead to the development of 
new drugs to prevent or delay further the emergence of recurrent disease. 

For Clinical Study-1, the predictive classifiers were built, trained/optimized using the BC030280 data set. 

For each analysis, we set challenging recurrence endpoints: <3 yrs V5. late >15 yrs (we studied only distant 
recurrent breast cancers). Performance was evaluated against a broad series of preset benchmarks (expanded 
substantially from those described in last year’s report), i.e., >70% performance in each of accuracy, sensitivity, 
specificity (from the receiver operating characteristic curve; ROC); AUC>0.8; >0.70 negative predictive value 
(NPV); >0.70 positive predictive value (PPV). We also estimated the hazard ratios and visualized the data using 
standard Kaplan-Meier plots and required models to achieve log rank test p<0.05 and HR>2.0. 

Results for the optimized classifier BC030280 of early (<3 yrs) v^. late (>15 yrs) recurrence on TAM. 


Accuracy 

Specificity 

Sensitivity 

AUC 

PPV 

NPV 

P-vaiue 

Hazard 

Ratio 

0.93 

1.00 

0.80 

0.82 

1.00 

0.91 

0.0006 

3.28 



As evident from the data, the optimized classifier exceeded or met all requirements. 
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As we have previously noted, independent validation is essential to aeeount for over-fitting and to assess the 
likely robustness of a elassifier aeross other data sets [14]. Thus, independent validation was done in a subset of 
the published dataset of Loi, S., Haibe-Kains, B., Desmedt, C., Wirapati, P., Lallemand, F., Tutt, A. M., Gillet, C., 
Ellis, P., Ryder, K., Reid, J. F., Daidone, M. G., Pierotti, M. A., Bems, E. M., Jansen, M. P., Eoekens, J. A., 
Delorenzi, M., Bontempi, G., Pieeart, M. J. and Sotiriou, C. Predicting prognosis using molecular profiling in 
estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics, 9: 239, 2008 

Results for validation of the optimized classifier BC030280 of early (<3 yrs) V5. late (>15 yrs) recurrence 

on TAM (Eoi data set). 


Accuracy 

Specificity 

Sensitivity 

AUC 

PPV 

NPV 

P-vaiue 

Hazard 

Ratio 

0.74 

0.75 

0.74 

0.83 

0.82 

0.64 

0.01 

2.35 



As evident from the data, the optimized classifier exceeded all but one of the stringent performance requirements. 
These data provide definitive validation of the performance of the classifier. 

The data from the studies described above provide 
compelling evidence that there are fundamental 
differences in the biology of ER+ breast cancer that will 
recur early (<3 yrs) vs. those that will recur much later 
(>15 yrs). We have begun to explore these data to 
understand the biological differences that may explain 
why these two groups of recurrent breast cancers are 
different. To date, we have applied our Differential 
Dependency Network analysis to extract small preliminary 
subnetworks. While these studies are still in progress, the 
figure shown here reports our first model. We have no 
immediate interpretation of these genes from a 
mechanistic study, as this initial model implicates many 
genes not previously known to be associated with breast 
cancer. However, altered function of nuclear receptors is 
strongly implicated by the inclusion of NCOA7 (nuclear receptor coactivator 7; ERAP140), which binds ER- 
alpha and mediates gene regulation [15]. IKZEl has very recently been associated with poor outcome in acute 
lymphoblastic leukemia [16]. 
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To understand the mechanistie differences between 
breast tumors that recur on TAM (at any time point) 
and those that do not, we have begun to explore the 
differential expression of genes across the entire 
BC030280 dataset. Focusing initially upon genes 
known to encode kinases, and applying a novel 
method (currently undergoing further development 
prior to publication), we have already obtained a 
preliminary insight into signaling within these breast 
tumors. The fig to the right shows the outcome from 
this first-pass analysis. Many known kinases and 
their targets are linked throughout this pathway. 

Additional studies will continue as we attempt to 
understand mechanism and to identify new targets 
for drug discovery. 

We have fewer specimens and less definitive 
outcomes data for the prospective AI study 
incorporated into Clinical Study-2, and there are no 
adequate published data sets for validation. 

Nonetheless, we have attempted an initial classifier 
trained by crossvalidation. While we would not expect to be able to meet the rather stringent panel of 
performance measures used for the study above on Clinical Study-1, we have explored prediction potential with 
the data currently available. For this analysis, we used only data from the AI treated cases. Since the data set is 
smaller than that for the TAM only retrospective study, cases were defined as “good” (partial response + minimal 
response) and “bad” (no change + progressive disease) based on clinical response. 

Results from the initial model using all samples are deceptively good and almost certainly reflect an overfitting of 
the model to the data. 




Acxxiracy 

Sensitivity 

Specificity 

100% 

100% 

100% 


We then used a leave-one-out crossvalidation to address possible overfitting. As is evident from the results, 
performance of an Al-only model is limited, only the measure of sensitivity approaches our performance 
benchmarks. This likely reflects the fewer samplers available for study. While we obtained the specimens we 
requested for this application, we are currently exploring the possibility of obtaining additional specimens/data 
from our collaborators in Edinburgh, i.e., Drs. Dixon and Miller. Such data will likely dramatically improve 
performance of the AI classifier. We are also in the process of using the TAM alone predictor from Clinical 
Study-1 to predict outcomes from the TAM and AI specimens in Clinical Trial-2. 


Acxxiracy 

Sensitivity 

Specificity 

58% 

67% 

33% 


Interestingly, there is no overlap in the listing of differentially expressed genes from the early vs. late TAM 
recurrence data sets and the AI responders vs. non-responders data. This observation suggests that those genes 
responsible for responsiveness to AIs are not the same as those directing TAM responsiveness. Overall, we 
believe that progress on Tasks 2/3 is fully consistent with our initial goals. 
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• TASK 3. Test, optimize, and validate the performanee of the elassifiers from Aim 2 in retrospeetive 
studies of human breast tumors. 

We proposed to rank and prioritize seleeted joint genes from RNA classifier built and optimized in Task 2 
(above) and retrain/reoptimize the initial neural network classifier. Finally, we will validate IHC classifier on 
independent data sets (data sets not used to build and train the classifiers). 

Please note that, for the sake of clarity, the outcomes from independent validation (in the reconstructed Loi data 
set) are incorporated above in the report for Task 2. 

As acknowledged in prior reports, we remain unable to move this later part of Task 3 (IHC studies) substantially 
forward on the timeframe as initially proposed because of the delays in getting approval to work with the clinical 
specimens (this aspect of Task 3 cannot begin until Tasks 1 and 2 are almost complete). Thus, we will address the 
IHC studies in the future at our expense as necessary and appropriate. Nonetheless, we have been able to use 
published data to validate our classifiers. This is now becoming standard practice in the field and it allowed us the 
flexibility to test multiple classifiers (we have presented only the classifiers that rank highest across our 
performance benchmarks). Hence, prioritizing validation studies based on published data has allowed us to obtain 
sufficient testing, optimization and validation to meet the primary goal of this task. The delay with the IHC 
studies also allowed us to be somewhat more productive and successful with some aspects of Tasks 1 and 2 than 
we had initially anticipated. We are currently preparing the results from the two clinical studies for publication. 


Accomplishments (for this one-year reporting period) 

• Continued processing of specimens received from Edinburgh 

• Obtained corresponding gene expression microarray data 

• Continued improving electronic data storage and annotation infrastructure 

• Updated data contained within the data storage and annotation infrastructure 

• Built, tested and optimized classifier of TAM responsiveness using data stored in this electronic 
infrastructure 

• Validated classifier of TAM responsiveness 

• Built, tested, and optimized initial classifier of AI responsiveness 

• Used data from this CTR to support additional (successful) applications for research funding to broaden 
the breast cancer research directions represented in this CTR 

• Built, validated, and published new data analysis methods 

o cross phenotype normalization 
o differential dependency network method 

o multi-scale independent component analysis method for biomarker identification 

• Began to identify new signaling network components associated with endocrine responsiveness 


Page 11 




Award Number: W81XWH-04-1-0570 


PI: Robert Clarke, Ph.D., D.Sc. 


Reportable Outcomes 


Papers and Meeting Reports* 


New Publications (for the present reporting period) 

• Wang, Y., Miller, DJ. & Clarke, R. “Approaches to working in high dimensional data spaces: gene 
expression microarrays.” Cancer, 98: 1023-1028,2008. 


• Shajahan, A.N., Riggins, R.B. & Clarke, R. “Apoptosis, cell death and breast cancer” Chapter 8. In: 
“Breast cancer: prognosis, treatment, and prevention. ” Eds: Pasqualini, J.R., Informa Healthcare; New 
York, NY; ppl37-156, 2008. 
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Several other manuscripts also are submitted and in preparation - these will be provided to DOD in the future. 
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coauthors from one or both of our subcontracts. Thus, our interdisciplinary and multi-institutional program 
worked very effectively and eollaboratively; this should further be apparent in the development of new 
informatics methods (Virginia Polytechnic and State University subeontract) and the large number of high quality 
breast tumor speeimens we obtained from the University of Edinburgh. 


Conclusions 

We have completed almost all of the work as proposed in the original application. We made significant progress 
on the researeh infrastructure goals and in the development and optimization of the methods needed for data 
analysis. We also have completed and published all of the data presented as preliminary data in the initial 
applieation and many of the results presented in prior reports. The clinical studies were held up by an 
unexpectedly long delay in obtaining final approval for our existing protoeols - as noted by previous reviewers of 
our annual reports, this delay adversely affeeted the prospective study. Consistent with the recommendation of 
these prior reviewers, it was neeessary to request a one-year no eost extension. This report is being submitted at 
the end of this period. We now present a TAM-specific and an Al-specific classifier. The former performs better 
than comparable classifiers, and it has been adequately validated in an independent data set. We are committed to 
eontinuing to follow the performance of patients on the prospective study and hope to obtain additional data to 
strengthen the performance of classifiers built on the prospeetively acerued data. We have also begun to obtain 
important new insights into the biology of breast eancer in the eontext of its endoerine responsiveness. We have 
also begun to identify new individual biomarkers and potential targets for drug discovery. Thus, we have also 
positioned the work from this CTR to enable us to pursue aggressively the long terms goals of this researeh 
program. Overall, we are excited about the new knowledge gained from this study, the potential to move the 
research forward and make an impact on breast eancer to the benefit of patients. At the same time, we have 
suecessfully opened up new area of breast eancer research that has already attracted additional peer-reviewed 
support. 

We wish to conelude by expressing our thanks to the DOD for its support of this important researeh program. We 
also wish to extend our thanks and admiration to those eonsumers who have, and who continue, to work tirelessly 
in support the DOD Breast Cancer Researeh Program that makes possible the work we have reported here and 
that performed by many of our peers. 
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Gene expression microarrays provide a wealth of information on 
gene expression patterns and cancer pathways with potential for 

(1) cancer diagnosis, prognosis, and prediction of therapeutic 
responsiveness (Ramaswamy et al, 2001; Dupuy and Simon, 2007); 

(2) discovering new cancer subtypes (Golub et al, 1999; Lange et al, 
2004); and (3) identifying cancer-associated (signalling) molecular 
markers and their complex interactions (Shedden et al, 2003; 
Ransohoff, 2004). However, achieving these biological/clinical 
objectives requires comprehensive analysis of microarray gene 
expression profiles that exist in high-dimensional data spaces, and 
relies critically on the functional capabilities and accuracy of the 
relevant analytical techniques (Allison et al, 2006). Cancer 
diagnosis/prognosis and therapeutic responsiveness prediction 
are all supervised classification/prediction problems (Duda et al, 
2001). Analysing gene expression patterns representing patients 
that manifest heterogeneous clinical outcomes to discover cancer 
subgroups amounts to an unsupervised clustering problem (Duda 
et al, 2001). Identification of cancer-associated markers can be cast 
either as supervised feature/gene selection or as multiple testing, 
with thousands of candidate markers and a small subset of true 
ones (Ransohoff, 2004). 

Although these analytical tasks fall neatly within statistical 
learning and pattern recognition (Jain et al, 2000), there is nothing 
conventional about these tasks for microarray data analysis. 
Unlike conventional pattern recognition that involves moderately 
dimensioned data, usually less than 100 features per sample and 
hundreds to thousands of samples, microarrays often involve over 
10 000 features/gene per sample (n) with typically at most several 
hundred clinical samples. A rule of thumb is to have at least 10 
training samples per feature dimension (Jain et al, 2000), whereas 
in microarrays this ratio is often closer to 0.01 samples per 
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dimension (Allison et al, 2006). High feature dimensionality and 
paucity of microarray samples pose unique challenges for, and 
inspire novel developments in, predictive classification, cluster 
discovery, and marker identification methodologies. 

A common subtask is feature selection. For predictive classifica¬ 
tion, only a subset of discriminatory genes is used to avoid 
overfitting, where a classifier is known ‘too well’ to fit even 
irreproducible ‘noisy’ training patterns and, thus, to achieve 
predictive accuracy that generalises well to unseen/test data. In 
unsupervised clustering in high dimensions, feature selection is 
likewise essential for discerning the underlying grouping tendency 
that may be ‘buried’ in a much lower-dimensional subspace - with 
many structurally irrelevant features yet small sample size, 
clustering algorithms are likely to identify false group structure. 
Lastly, a separate objective is to identify cancer-associated genes 
and their joint effects, rather than to simply build a predictive 
model for the disease. 

Although feature selection is integral to each of these analytical 
tasks, an exhaustive search of all 2"—1 possible feature subsets is 
prohibitive for a large n. Thus, practical feature selection 
techniques are of necessity heuristic, with an inherent accuracy/ 
complexity trade off. Moreover, while multivariate analysis 
methods based on complex criterion functions may reveal subtle 
joint marker effects, they are also prone to overfitting (Lai et al, 
2006). Additionally, high dimensionality compromises the ability 
to validate marker discovery, which requires accurately measuring 
true and false discovery rates (Ransohoff, 2005). These issues have 
prompted the development of a variety of novel statistical methods 
for estimating (or controlling for) false discoveries (Storey, 2003). 

PREDICTIVE CLASSIFICATION 

Performance of a predictive model depends on the interrelation¬ 
ship between sample size, data dimensionality, and model 
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complexity. The accuracy of learned models tends to deteriorate in 
high dimensions, a phenomenon called the ‘curse of dimension¬ 
ality’ (Duda et al, 2001). This phenomenon is illustrated for 
classification by an example by Trunk (1979). Consider two equally 
probable, normally distributed classes with common variance in 
each dimension. For the feature indexed by n = 1,2,3..., class 1 has 
mean and class 2 has mean — l/n^“. Thus, each additional 

feature has some class discrimination power, albeit diminishing as 
n increases. Trunk evaluated error rates for the Bayes decision 
rule, applied as a function of n, when the variance is assumed 
known but the class means are estimated based on a finite data set. 
Trunk found that (1) the best test error was achieved using a finite 
number of features; (2) using an infinite number of features, test 
error degrades to the accuracy of random guessing; and (3) the 
optimal dimensionality increases with increasing sample size. 
These observations are consistent with the ‘bias/variance dilemma’ 
(Jain et al, 2000). Simple models may be biased but will have low 
variance. More complex models have greater representation power 
(low bias) but overfit to the particrdar training set (high variance). 
Thus, the large variance associated with using many features 
(including those with modest discrimination power) defeats 
any possible classification benefit derived from these features 
(Figure 1). With severe limits on available samples in microarray 
studies, complex models using high-feature dimensions will 
severely overfit greatly compromising classification performance. 
Computational learning theory provides distribution-free bounds 
on generalisation accuracy in terms of a classifier’s capacity related 
to model complexity (Vapnik, 1998). Relevance of these bounds to 
the microarray domain is discussed by Aliferis et al (2006). 

There are some strategies for mitigating the aforementioned 
problem. One is to fit the high-dimensional data, but using simple 
models that restrict complexity such as naive Bayes models that 
assume features are conditionally independent or even simpler 
models that share some parameters across classes (Novovicova 
et al, 1996). Another approach is to apply support vector machines 
(SVMs), which attempt to avoid overfitting by finding a linear 
discriminant function (or generalised linear discriminant) that 
maximises the margin (the minimum distance of any sample point 
to the decision boundary) (Vapnik, 1998). The number of free 
parameters in SVMs is not a function of the dimensionality, but 
instead is upper-bounded by the number of samples, which for 


<-High bias Low bias ->• 



Figure I A demonstration of the bias/variance diiemma in predictive 
classification. Specifically, the error of model fitting can be decomposed into 
two components, bias (approximation error) and variance (estimation 
error). Added dimensions can degrade the prediction performance if the 
sample size is small relative to the dimensionality. For a fixed sample size in 
the high-dimensional data space, there is a trade off between the decreased 
approximation error and the increased estimation emor. 


microarrays is much smaller (Ramaswamy et al, 2001). Flowever, 
whether using linear or nonlinear kernels, SVMs are not immune 
to the curse of dimensionality. Finally, some methods aim to 
reduce the amount of parameter learning to avoid overfitting, 
achieved by regularisation techniques modifying the training 
objective function or limiting the parameter learning cycles (Duda 
et al, 2001). 

Many microarray-based studies suggest that, irrespective of the 
classification method, feature selection is vital for achieving good 
generalisation performance (Statnikov et al, 2005). The vast 
number of feature subsets necessitates applying heuristic search 
techniques, with various accuracy/computation trade offs (Guyon 
and Elisseeff, 2003). Filtering methods apply knowledge of the class 
labels to evaluate the discrimination power either of individual 
genes (univariate) or collections of genes (multivariate), based on 
criteria such as signal-to-noise ratio, correlation measures, and 
mutual information, before classifier training. A recent study 
found that for small sample sizes, univariate methods faired 
comparably to multivariate methods, whose performance may be 
affected by overfitting (Lai et al, 2006). 

Unlike filtering, wrapper-based approaches combine feature 
selection and classifier training, with the classifier learning 
algorithm repeatedly applied for different feature subsets and 
with the best subset chosen based on a specified criterion (Jain 
et al, 2000). These methods can improve predictive power by 
capturing higher order (and complex, nonlinear) joint feature 
effects. Perhaps the simplest example is the ‘noisy XOR problem’, 
for which two individual features and their linear combinations 
have no discrimination power, but a simple nonlinear combination 
is perfectly discriminating (Duda et al, 2001; Guyon and Elisseeff, 
2003; Figure 2). 

Wrapper algorithms, specified by the subset search method and 
the criterion for evaluating feature subsets, entail large computa¬ 
tion in high dimensions, as the number of candidate spaces 
evaluated grows with the dimension. These algorithms include 
‘greedy’ forward selection, with ‘informative’ features added 
starting from a null set. Other algorithms apply a backward 
search, which starts from the full space and then eliminates 
features. Floating (bidirectional) searches, which combine forward 
and backward steps, and more complex simulated annealing and 
genetic algorithms, can also be applied (Guyon and Elisseeff, 2003). 
Finally, there are methods that integrate classifier training and 
feature selection, such as decision trees, which essentially perform 
forward feature selection while growing a tree and backward 
elimination while pruning the tree (Duda et al, 2001). For 
evaluation criteria, either predictive accuracy on held-out test 
data (Statnikov et al, 2005), or criteria that can be evaluated solely 
on training data such as classifier margin or Bayesian model 
selection criteria (Guyon et al, 2002), can be used. 


UNSUPERVISED CLUSTERING 

In microarray data analysis, unsupervised clustering must be 
cautiously applied and may be unnecessary when samples come 
with appropriate and reliable supervising labels (Ramaswamy et al, 
2001; Clarke et al, 2008). However, unsupervised clustering 
constitutes an important tool for discovering underlying cancer 
subtypes or gene modules (Frey and Dueck, 2007; Miller et al, 
2008). Such exploration may suggest possible refinement to 
established cancer categories, where cancer subtypes manifest 
radically different clinical behaviour and may correspond to 
distinct biological pathways involving subtype-specific markers 
(Shedden et al, 2003). For example, prostate cancer can be an 
indolent cancer, remaining dormant throughout life, or an 
aggressive cancer leading to death. Similar issues arise in drug- 
resistance cases, where different cancer subtypes exhibit distinctive 
therapeutic responses (Golub et al, 1999). 
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Figure 2 An example of XOFl/chessboard-like joint effects. Although the classes consist of disjoint clusters, each variable has completely overlapping class 
conditional densities, that is, no marginal effect. In contrast, working together, the two variables provide good class separability. 


Furthermore, when therapeutic responsiveness of patients is 
assessed based on interim growth or shrinkage of a tumour rather 
than the definitive clinical outcome, unsupervised clustering may 
be used to validate this supervision information, either to support 
it or to raise uncertainty about this ‘ground-truth’ if the correlation 
between the cluster labels and assessed responsiveness is weak. 
Moreover, trusted class labels on samples can be withheld during 
unsupervised clustering and subsequently used to validate the 
clustering methodology/assumptions. Strong correlation between 
clustering outcomes and known class labels supports the applic¬ 
ability of this clustering approach to other unlabelled microarray 
data (Golub et al, 1999). 

While warranted in microarray data exploration, unsupervised 
clustering is extremely challenging in high dimensions with very 
few samples. Standard methods such as K-means and hierarchical 
clustering evaluate distances between data points using all (equally 
weighted) features. Thus, many noisy/irrelevant features will 
dominate the (much smaller set of) relevant features in determin¬ 
ing how data points are partitioned, for example, many invariantly 
expressed genes used for microarray normalisation are irrelevant 
to classification or clustering. Rather than clustering samples using 
all genes, a practical alternative is to embed gene selection within 
unsupervised clustering - removal of noisy features improves 
clustering accuracy, which, in turn, guides a more accurate round 
of feature selection. Methods have been proposed along these lines 
(Xing and Karp, 2001; Graham and Miller, 2006), together with 
novel initialisation schemes (Frey and Dueck, 2007; Wang et al, 
2007). 

Another major challenge for clustering in high dimensions is 
estimating the number of clusters. Standard methods choose 
cluster number by best fitting the data while incurring least model 


complexity. However, under the widely used Bayesian information 
criterion (Duda et al, 2001), model complexity is linear in the 
number of parameters and quickly grows with each added feature. 
As many of these parameters model noisy/irrelevant features, their 
data fitting benefit is grossly outweighed by their contribution to 
model complexity, which leads to gross underestimation of the 
number of clusters. In a study by Graham and Miller (2006), a 
‘parsimonious’ mixture model allows clusters to share distribu¬ 
tions for noisy features, which enhances accuracy in estimating 
both the cluster parameters and the cluster number in high 
dimensions. Intrinsic to this modelling is identification of a 
distinct relevant feature subset specific to each sample cluster, that 
is, for the microarray domain, each subclass will have its own gene 
set, as has been conjectured by Shedden et al (2003); Ein-Dor et al 
(2005). Another strategy for identifying this cluster structure is 
top-down divisive clustering that explores and generates hier¬ 
archical mixtures in nested subspaces (Wang et al, 2007). By 
projecting high-dimensional data of a current cluster to multiple 
two-dimensional visualisation subspaces, the human gift for 
pattern recognition can be exploited to assess the current solution 
and assist further clustering refinement (Figure 3). Being more 
data-adaptive and process-transparent, human interaction may 
bring subjectivity, and thus must be carefully applied. 


MARKER IDENTIFICATION 

Marker identification aims to discover those genes and their 
complex interaction effects that have statistically significant 
correlations with cancer phenotypes. As it is currently largely 
unclear how molecular variants and their interactions determine 
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Figure 3 An example of coarse-to-fine top-down divisive unsupervised clustering using VISDA. (A) Multiple complementary visualisation subspaces 
derived from different data structure preserving projection principles. (B) Tree of phenotype with embedded model selection function, where MO refers to 
the model order (number of clusters) and DL refers to the description length (model complexity as a function of cluster number). 


cancer pathogenesis and propensity, marker identification is 
valuable for improving understanding of the molecular mechan¬ 
isms of cancers and for suggesting novel drug targets. Discovered 
markers may also define a subset of networked causal genes that 
regulate disease phenotype. A review of the current state of this 
effort is discussed by Aliferis et al (2006). 

The objectives of feature selection for predictive classification 
and marker identification bear close resemblance. Although it is 
tempting to view these two problems as ‘one and the same’, this is 
often inappropriate. Inclusion of some true cancer markers in a 
feature set for cancer classification may provide negligible 
improvement in classification accuracy even though these markers 
are significantly associated with the cancer outcome of interest. A 
trivial example is where two markers are perfectly correlated, in 
which case only one of the two needs to be included in a predictive 
feature subset. A more interesting example is the one in which. 


even though two markers are only partially correlated, a 
classification model will not perceive any benefit from using both 
markers. This is illustrated below: 

Let A and B take on one of the four possible discrete values and 
suppose the ground-truth statistics on class label C are Prob[C = 
‘cancer’jA = 1] = 1.0; Profc[C=‘cancer’jA = i] = 0.5, i = 2,3,4; 
Proii[C=‘cancer’|B = 3] = 1.0; and Proh[C=‘cancer’|B =j] = 0.5, 
j= 1,2,4. Suppose Profo[A = 1] = 0.1; Profc[B = 3] = 0.7; and 
Prob[B = 3|A = 1] = 0.5. Thus, A and B are both informative about 
the disease (for one value), and these variables are only partially 
correlated. However, in a small training set, it is quite possible that 
each time A = 1, B = 3 also occurs, even though Prob[B = 3|A = 1] 
is much less than one. In this case, while association-based marker 
discovery might include both A and B, classification-based marker 
discovery would only include B, because the training set suggests 
no predictive benefit from including A. 
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More generally, whether predictive gene selection will include a 
gene that possesses some predictive benefit will depend on the 
sensitivity of the criterion function used. For example, a predictive 
model may achieve the same estimated classification error 
rate using several different feature subsets, even if there is a 
unique true marker subset, with greatest class discrimination 
power. Another limitation of predictive gene selection is that most 
classification models lack interpretability, that is, they do not allow 
easy discernment of the underlying interactions between the 
identified markers. The sole focus of most predictive feature 
selection techniques is on defeating the curse of dimensionality. 
Exceptions to this include decision trees (if not too large) and 
Bayesian networks (Duda et al, 2001). 

Although association-based approaches may ultimately be found 
superior for identifying cancer markers and their interactions, 
these methods also have limitations. First, identifying marker 
interactions, particularly those involving markers with insignif¬ 
icant marginal effect, requires an exhaustive search over the full 
gene expression space. It is only practical to examine very low- 
order interactions, for example, ‘10 000 choose 2 or 3' possible 
interactions (Jain et al, 2000). Thus, higher-order interactions may 
get missed. One possible strategy is to first apply classification- 
based gene selection to significantly reduce the search space, 
followed by (exhaustive search) association-based marker identi¬ 
fication. Second, it is difficult to evaluate and/or control inference 
accuracy for such testing, which involves numerous hypotheses. 
There is an inherent trade off between statistical power (true 
positive) and Type 1 error (false positive). Multiple testing for 
thousands of interacting genes at typical confidence levels leads to 
unacceptably large false positives. Family-wise error rate techni¬ 
ques can compensate, but conservatively toward minimising false 
positives and may have insufficient power. Other strategies 
improve inference accuracy through variance shrinkage that 
accounts for statistical dependencies between genes via computa¬ 
tionally intensive permutation testing to accurately specify the null 
distribution. 

To assess the true statistical significance of the implicated gene 
subset in multiple testing, one recent method is the randomisa¬ 
tion-permutation test (Efron and Tibshirani, 2007). This method 
addresses the concern that a randomly selected gene subset may 
appear to possess significant association with the phenotype if only 
subjected to subject permutation testing. To assure that such false 
discoveries do not occur, a selected gene subset must, additionally, 
be subjected to a gene randomisation test, where the subject 
permutation test is to assess whether the implicated gene subset 
indeed has significant prediction power rather than ‘by-chance’ 
and the gene randomisation test assesses whether the implicated 
gene subset has significant prediction power as compared with that 
of any randomly selected gene subset of the same size. 

An additional concern in marker identification is the impact of 
confounding variables (Ransohoff, 2005). A given data set may 
represent a biased sample with respect to factors such as patient 
age, gender, life style or with respect to sample handling, and 
expression levels for a putative marker may be more strongly 
associated with these confounding effects than with disease 
presence (Clarke et al, 2008). Although some confounding effects 
can be mitigated by careful study design or by explicitly 
accounting for these factors when performing marker identifica¬ 
tion, further research is needed to devise more effective 
methodologies for this purpose. Nevertheless, risk factors are not 
confounding effects to be discounted - there may be cancer-related 
gene - environment interactions that need to be identified. Finally, 
there are latent confounding sources due to biological multi¬ 
modality. For complex phenotypes such as cancers, the presence of 
multiple, interrelated biological processes may obscure the true 
relationships between a gene subset and a specific outcome, 
creating spurious associations that appear statistically correct and 
yet may be false. 


OUTCOME VALIDATION 

In assessing the performance for any of three fundamental tasks, a 
validation procedure must be carefully designed, recognising limits 
on the accuracy of estimated performance, in particular for small 
sample size. In the study by Dupuy and Simon (2007), it was shown 
that, in more than 50% of a representative sample of past studies, 
inadequate statistical validation was performed. Clearly, classifica¬ 
tion accuracy must be assessed on labelled samples ‘unseen’ during 
training. However, single batch held-out test data are often precluded 
in microarray studies, as there wiU be insufficient samples for both 
accurate classifier training and accurate validation. The alternative is 
a sound cross-validation procedure, wherein all the data are used for 
both training and testing, but with held-out samples in a testing fold 
not used for any phase of classifier training, including feature 
selection and classifier design. Furthermore, performance (for either 
predictive classification or marker identification) depends on the 
threshold used to discriminate between categories. Most reported 
prediction accuracy rates are based on user-defined thresholds for a 
single operating point. A more meaningful estimate is the receiver 
operating characteristic curve obtained by using sensitivity (true 
positive rate) and specificity (true negative rate) acquired at a set of 
threshold values. The area under the curve gives a comprehensive 
figure-of-merit for prediction accuracy and can be shown to be a 
consistent but more sensitive measure than error rate for comparing 
classifiers, identifying performance differences between classifiers in 
cases where, evaluated solely by error rate, two classifiers would be 
deemed equivalent (Swets, 1988; Wang et al, 2006). 

Unlike predictive classification assessment using labelled samples, 
validating unsupervised clustering requires alternative avenues 
when labels are not available. Synthetic data with constructed 
ground-truth may be used to assess the accuracy of a clustering or 
cluster number estimation algorithm. However, this approach will 
not validate that particular statistical assumptions are suitable for 
fitting molecular profiles from a given population. Alternatively, 
some form of cross-validation may be used to assess the ‘stability’ of 
clustering solutions (Lange et al, 2004). Stability analysis has been 
applied to clustering microarrays by Yeung et al (2001). Even when 
class labels are known, Dupuy and Simon (2007) suggest not to use 
them to select the gene space, as this will bias the clustering results. 

It is even less likely to have ground-truth for validating marker 
identification. Synthetic data constructed from real microarray 
data can be used to assess a marker identification methodology, 
with class labels, markers, and interaction models handpicked and 
treated as ground truth. Importantly, ‘reproducibility’ of marker 
identification outcomes over multiple/bootstrap data sets may 
provide reasonable confidence (uncertainty assessment) on the 
discovered markers (Ransohoff, 2004). 

Ultimately, discovered cancer markers or subtypes must be 
validated against definitive biomedical ground-truth. However, the 
cost of such validation demands a high degree of confidence in the 
knowledge extracted from microarray data by marker identifica¬ 
tion and clustering algorithms. Specifically, such knowledge 
extraction should not strongly depend on the particular random 
sample of data used or on variable aspects of the algorithms. Many 
clustering algorithms find only locally optimal solutions whose 
quality depends on the pseudorandomly chosen initial cluster 
parameter values (Frey and Dueck, 2007). Also, greedy sequential 
feature selection techniques are often ‘unstable’, giving results that 
may be highly dependent upon the particular training data used. 
There are two implications. First, whether synthetic data or real 
microarray data are used, extracted knowledge should be validated 
by assessing its reproducibility over multiple independently 
acquired data sets. Independent data sets are easily produced in 
the synthetic case, but at high cost in the case of real data. The 
second implication is that algorithms should be made as stable as 
possible to maximise the generalisation of their results. For marker 
discovery, one such strategy is to perform marker ranking multiple 
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times, using bootstrap samples and/or fc-fold cross-validation from 
the same data set, with the final, selected markers the ones with 
highest average ranking (and perhaps low variance/uncertainty). 
Nevertheless, the cost of increased stability in such approaches is 
an increase in computation. 
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INTRODUCTION 

In 2007, approximately 213,000 new cases of invasive 
breast cancer will be diagnosed and over 41,000 American 
women will die of this disease (Jemal et al., 2007). Lifetime 
risk of developing breast cancer is modified by several 
factors related to development (e.g., weight at birth, age at 
menarche), reproductive life (e.g., parity, lactation, age 
at menopause), lifestyle (e.g., obesity, alcohol consumption), 
and inheritance (e.g., mutant BRCAl) (Ahlgren et al., 2004; 
de Jong et al., 2002; Feigelson et al., 2004; HuUta and Stark, 
1995). Despite the importance of family history, the altered 
expression/function of tumor suppressor genes such as 
BRCAl/2 and TP53 do not account for the high prevalence 
of sporadic or non-BRCA familial hreast cancers. Among 
mutant BRCAl/2 carriers, the timing of breast cancer onset 
and progress can vary substantially, but the factors respon¬ 
sible for these variations are not fully understood (Nathanson 
et al., 2001). The precise molecular events responsible for 
affecting disease progression remain unknown in both spo¬ 
radic and inherited hreast cancers, but those that affect a 
breast cancer cell’s choice to proliferate, differentiate, or die 
are likely to he key factors in this process. 


Randomized trials and large meta-analyses clearly show 
that all breast cancer patients derive a statistically significant 
survival benefit from chemotherapy and endocrine therapy 
(1998;2002h;2002a; Fisher et al., 1996; Mansour et al., Q1 
1998), Tamoxifen (TAM; antiestrogen), Paclitaxel (taxane), 
and Adriamycin (anthracycline) being among the most 
effective single agents. The survival benefit gained from 
current systemic therapies largely reflects the abilities of 
cytotoxic and endocrine agents to modify cell survival such 
that cells are driven down an irreversible cell death pathway 
(Fischer and Schulze-Osthoff, 2005). Nonetheless, advanced 
hreast cancer largely remains an incurable disease, and new 
treatment regimens and schedules have led to only incre¬ 
mental decreases in breast cancer-^'elated mortality. A better 
understanding of the factors that regulate breast cancer cell 
survival or death is central to improving breast cancer 
outcomes in women. 

MOLECULAR MECHANISM OF APOPTOSIS 

Apoptosis, a type of programmed cell death (PCD), is an 
essential feature of normal mammary gland function. Failure 
to undergo apoptosis can lead to the development of cancer 
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in breast epithelial cells (Green and Streuli, 2004). Cancer 
cells can evade apoptosis by modifying the signaling path¬ 
ways that lead to apoptosis. Thus, significant effort is 
invested in the development of anticancer therapeutic agents 
that either selectively induce cell death in cancer cells or 
restore their apoptotic threshold (Meng et ah, 2006). 

The distinctive morphological and biochemical hallmarks 
of apoptosis include cell shrinkage, pyknosis (chromatin 
condensation), karyorhexis (nuclear fragmentation), mem¬ 
brane blebbing, and fragmentation of the cell into apoptotic 
bodies (Kerr et ah, 1972). Phagocytic cells recognize and 
remove the apoptotic bodies, and thus avoid immune activa¬ 
tion around the dying cell. Cell death by apoptosis requires 
the expenditure of ATP and the activation of proteases 
known as “caspases”(cysteine-dependent aspartate-specific 
proteases) (Wolf and Green, 1999). Caspases exist as latent 
zymogens that may be activated by autoactivation, trans¬ 
activation, or proteolysis by other proteinases. In humans, 
over a dozen caspases have been identified (Hengartner, 
2000); among these, caspases-3, -6, and -7 are called 
“execuhoner” caspases and they mediate their effect by the 
cleavage of specific cellular substrates. These executioner 
caspases are activated by the “initiator” caspases such 
as caspases-8, -9, and -10 (Denault and Salvesen, 2002; 
Riedl and Salvesen, 2007; Salvesen, 2002; Wolf and 
Green, 1999). 

Overview of Extrinsic and intrinsic Pathways 

Many anticancer therapies activate caspases that cleave a 
number of different but selective substrates in the cyto¬ 
plasm or nucleus, leading to many of the morphological 
features of an apoptotic cell death (Degterev et ak, 2003). 
Caspase activation is initiated at the plasma membrane 
through death receptors either by an “extrinsic” pathway, 
or at the mitochondria by an “intrinsic” pathway (Fig. 1). 



Figure 1 A simplified model of extrinsic and intrinsic signal¬ 
ing mechanisms involved in apoptotic cell death in the cell. 


Shajahan et al. 

In normal tissue, apoptosis maintains homeostasis, and 
this process is tightly controlled at critical points of the 
signaling cascade (Green and Kroemer, 2004; Kroemer 
et ak, 2007). 

In the extrinsic or death receptor pathway, stimulation 
of death receptors of the tumor necrosis factor (TNF) 
receptor superfamily such as CD95 (APO-1/Fas) or TNF- Q2 
related apoptosis-inducing ligand (TRAIL) receptors-1 
and -2 result in the recruitment and oligomerization of 
the adapter molecule FADD (Fas-associating death 
domain-containing protein). The oligomerized FADD 
then localizes within the death-inducing signaling com¬ 
plex (Debatin and Krammer, 2004) followed by activa¬ 
tion of initiator caspases-8, or -10 that contain death 
effector domains (Vandenabeele et ak, 2006; Wolf and 
Green, 1999). 

In the intrinsic or mitochondrial pathway, the execu¬ 
tioner caspases are activated by caspase-9, which is acti¬ 
vated by the adapter molecule apoptotic protease activating 
factor-1 (APAF-1) within a multiprotein complex called the 
“apoptosome.” Activation of APAF-1 depends on both 
cytochrome c release from the intermembrane space of 
the mitochondria and ATP/dATP (Cain et ak, 2002), lead¬ 
ing to an increase in mitochondrial membrane potential 
(MMP) that is closely controlled by pro-(BAD, BAX) and 
antiapoptotic (BCL2) members of the BCL2 family of 
proteins (Decaudin et ak, 1998; Green and Kroemer, 

2004). A caspase-independent signal can also originate 
from within the mitochondria leading to irreversible loss 
of mitochondrial function; this can include the release of 
caspase-independent death effectors such as apoptosis- 
inducing factor (AIF) or endonuclease G (Cande et ak, 

2004; Kroemer and Martin, 2005; Li et ak, 2001). 

The intrinsic and extrinsic pathways converge at the 
executioner caspases. The executioner caspases selec¬ 
tively cleave their substrates in the primary sequence 
(always after an aspartate residue), and these target 
proteins can range from single polypeptide chain 
enzymes (poly ADP-ribose polymerse, PARP) to com¬ 
plex macromolecules (the lamin network) (Hengartner, 
2000). PARP inactivation by caspase-specific cleavage, 
which forms an 89-kDa fragment, is a biochemical hall¬ 
mark of apoptosis. Members of the heat shock protein 
(HSP) family, e.g., HSP70, can delay apoptosis by pre¬ 
venting the nuclear import of AIF (Ravagnan et ak, 2001). 

The intrinsic drive for cancer cells to undergo apoptosis is 
held in check by inhibitor of apoptosis proteins (lAPs). 
Downstream of cytochrome c release, second mitochon¬ 
drial activator of caspases/direct lAP binder with low pi 
(Smac/DIABLO) neutralize lAPs such as X-linked lAP 
(XIAP), survivin, and Apollon through their baculoviral 
inverted repeat (BIR) domains, and so indirectly promote 
caspase activation (Saelens et ak, 2004; Vaux and Silke, 

2003). 
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Alternative Death Pathways 

Effectiveness of antineoplastic drugs can be assessed on 
the basis of their ability to induce apoptosis in tumor cells; 
however, it is now becoming evident that apoptosis may 
not be the only, or perhaps even the primary, mechanism 
of cell death in solid tumors (Brown and Attardi, 2005). 
Frequent failure to correlate apoptotic cell death with the 
effects of chemotherapeutic drugs on human tumors and 
cell lines (Brown and Attardi, 2005; Roninson et al., 2001) 
has prompted studies of other mechanisms of cell death. 

Autophagy 

Autophagy involves sequestration of cytosol and cytoplas¬ 
mic organelles within double membranes called autopha¬ 
gosomes or autophagic vacuoles. The vesicular contents 
are broken down by pH-sensitive lysosomal hydrolases 
and the degradation products are recycled for use in 
macromolecular synthesis and/or bioenergetics (Kroemer 
and Jaattela, 2005). In general, autophagy is important in 
the developmental remodeling of cells, cellular adaptation 
to nutrient deprivation, and the elimination of damaged 
organelles (Edinger and Thompson, 2004; Edinger and 
Thompson, 2003; Klionsky and Emr, 2000). Paradoxi¬ 
cally, autophagy can act both as a cell survival mechanism 
when extracellular nutrients or growth factors are limited 
and as an alternative cell death pathway to apoptosis (Jin, 
2006) (Fig. 2). Removal of organelles such as mitochon¬ 
dria, which are apoptotic mediators, may protect the cell 
against apoptosis. Beclin-1/ATG6 (BECNl) is a key regu¬ 
lator of autophagy (Furuya et al., 2005), and monoallelic loss 
of the BECN1 locus is seen in over 40% of breast cancers 
(Liang et al., 1999). BCL2 antiapoptotic proteins can block 
autophagy by inhibiting BECNl (Pattingre et al., 2005). 



Autophagy • Survival 

I\ 

•- Apoptosis -- Cell Death 

/ \ 


Survival Cell Death 


Figure 2 In response to a death signal, autophagy can prevent 
apoptosis to lead to survival or induce apoptosis to lead to cell 
death. The mechanism(s) that controls the balance between cell 
survival and cell death by autophagy, in response to a particular 
death signal, remains to be clarified. 


Thus, antiapoptotic members of the BCL2 family may 
function as oncogenes not only by directly blocking apop¬ 
tosis but also by blocking autophagy (Pattingre and Levine, 
2006). Although the early events in autophagy are revers¬ 
ible, later events may share mechanism(s) with other death 
pathways. For example, cleavage of ATG5 by calpain 
(Yousefi et al., 2006) or upregulation of BID (Lamparska- 
Przybysz et al., 2006) can switch from autophagy to apoptosis. 

Mitotic Catastrophe 

Mitotic catastrophe is a type of cell death that occurs 
during or shortly after failed mitosis, which may occur 
following treatment with microtubule stabilizing or desta¬ 
bilizing agents or DNA damage (Mansilla et al., 2006; 
Roninson et al., 2001). The morphological alterations that 
occur during mitotic catastrophe are distinct from those 
that occur during apoptosis. These changes include multi- 
nucleation or the products of micronuclei because of 
faulty checkpoints, DNA structure checkpoints, or the 
spindle assembly checkpoint (also known as mitotic 
checkpoint) (Castedo et al., 2004; Roninson et al., 
2001). The disruption of normal chromosome segregation 
of many chromosomes results in rapid cell death (Castedo 
et al., 2004). However, in the absence of cell death 
following mitotic catastrophe, the cell can divide asym¬ 
metrically, resulting in the generation of aneuploid daugh¬ 
ter cells (Kops et al., 2005). Hence, mitotic catastrophe 
prevents irregular mitosis and, in turn, avoids aneuploid- 
ization that could lead to oncogenesis (Castedo et al., 
2004; Kops et al., 2005). 

Necrosis 

A cell can undergo necrosis following physical damage or 
toxic insults when the intracellular level of ATP falls to a 
level incompatible with survival. The decision to undergo 
necrosis over apoptosis is dependent on the level of 
intracellular ATP, since apoptosis requires the presence 
of ATP, while necrosis results in ATP depletion (Nicotera 
et al., 1998). Morphologically, necrosis is identified by 
vacuolation of the cytoplasm, breakdown of the plasma 
membrane, and an induction of inflammation around the 
dying cell due to the release of cellular contents and 
proinflammatory molecules (Edinger and Thompson, 
2004). The increase in cell volume (oncosis) during 
necrosis results in rupturing of the plasma membrane 
and the unorganized breakdown of swollen organelles 
(Kroemer et al., 2007). The ability of necrotic cells to 
promote local inflammation can support tumor growth 
(Vakkila and Lotze, 2004). While necrosis was previously 
thought to be a passive form of cell death, over the past 
several years the idea that cellular signaling pathways 
can specifically initiate necrosis has gained momentum 
(Proskuryakov et al., 2003). 
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Senescence 

Cellular senescence was first identified as the state of 
permanent cell cycle arrest resulting from the replicative 
exhaustion of normal diploid cells in culture (Hayflick, 
1965). Senescent cells appear static but are metabolically 
active, and appear large and flat with vacuoles and a large 
nucleus. In contrast to quiescent cells (reversible cell 
cycle arrest), senescent cells are unresponsive to mito¬ 
genic stimuli and are identified by cellular increase in 
P-galactosidase activity (Dimri et al., 1995). Thus, their 
inability to respond to serum or growth factors prevents 
immortalization and subsequent neoplastic transformation 
of the senescent cells. While apoptosis kills and eliminates 
potential cancer cells, cellular senescence irreversibly 
arrests cell growth (Campisi, 2001). Although the signal¬ 
ing mechanism for cellular senescence remains undeter¬ 
mined, DNA damage regulation by tumor suppressor 
genes such as TP53 and RB and epigenetic regulation of 
gene expression clearly play crucial roles (Campisi, 2001; 
Narita, 2007). 

Telomeres are DNA-protein complexes that cap the 
end of linear eukaryotic chromosomes and range from 2 to 
15 kb in humans (Martens et al., 1998). Telomeres prevent 
chromosomes from degradation, recombination, fusing 
with other chromosomes, and from being mistaken for 
DNA double-strand breaks. The de novo synthesis of 
telomeres is dependent on the enzyme telomerase, a 
reverse transcriptase (Cech, 2004). In most human cells, 
telomerase activity is gradually downregulated over time, 
resulting in successive telomere shortening that can ulti¬ 
mately limit their ability to proliferate; this is known 
as “cellular senescence,” “mortality stage 1 (Ml),” or 
“replicative senescence,” since the maintenance of func¬ 
tional telomeres is crucial for continued proliferation. Inac¬ 
tivation of cell cycle checkpoint genes like TP53 can result 
in continued proliferation by bypassing the cell growth 
arrest in Ml, eventually leading to critically short telomeres 
and massive cell death or “mortality stage 2 (M2)” or 
“crisis.” Individual cells can evade M2 by maintaining their 
telomerase, resulting in immortal cancer cells that have 
been reported in 85% to 90% of human tumors (Kim et al., 
1994). Thus, inhibition of telomerase activity may be a 
useful approach for mechanism-based anticancer therapy 
(reviewed in (Zimmermann and Martens, 2007). 

ENDOCRINE THERAPIES 

Endocrine therapy is often chosen as the first line of 
therapy in estrogen receptor a (ERa) and/or progesterone 
receptor (PR)-positive breast cancer patients because of its 
established efficacy and safety profile. However, endo¬ 
crine resistance frequently arises. Two forms of endocrine 
resistance have been described: de novo resistance that is 
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evident at the initial exposure to endocrine therapy or 
acquired resistance that arises over time after initiation of 
endocrine therapy. Absence of ERa expression is the most 
common de novo resistance mechanism, whereas a com¬ 
plete loss of ER expression is relatively uncommon in 
acquired resistance (Clarke et al., 2003; Moy and Goss, 
2006). Improved knowledge of the mechanism of hor¬ 
monal resistance, and the relationship between estrogen 
signaling and cell growth pathways, could provide the 
basis for combining signaling pathway inhibitors with 
endocrine therapies. 

Antlestrogens, Aromatase Inhibitors, 
and Apoptosis 

Endocrine therapy, administered as an antiestrogen 
(e.g., TAM or Easlodex) or an aromatase inhibitor 
(e.g., Letrozole or Anastrazole), is the least toxic and 
most effective means to manage hormone-dependent breast 
cancers. Antiestrogens, and TAM in particular, have been 
the “gold standard” first-line endocrine therapy for over 
20 years. Newer antiestrogens such as Easlodex are also 
showing significantly improved activity relative to TAM 
and some aromatase inhibitors (Howell et al., 1995; Howell 
et al, 2002). Third generation aromatase inhibitors have 
emerged as viable alternatives to antiestrogens for first-line 
endocrine therapy; overall response rates are generally 
greater for aromatase inhibitors (Ferretti et al., 2006). 
Aromatase inhibitors also have a different mechanism of 
action and toxicity profile to antiestrogens, but whether this 
toxicity profile favors aromatase inhibitors over antiestro¬ 
gens is controversial (Ferretti et al., 2006). Aromatase 
inhibitors can only be given as single agents to postmeno¬ 
pausal women or to women who do not have functioning 
ovaries; antiestrogens can be given irrespective of a 
patient’s menopausal status. 

The antiestrogen TAM, a nonsteroidal triphenylethy- 
lene derivative, has been used in the treatment of breast 
cancer for over 30 years (Litherland and Jackson, 1988). 
Besides being an effective means of treatment for women 
with ERa-positive tumors, TAM reduces the incidence of 
disease in healthy women at high risk of developing breast 
cancer (Cuzick et al., 2003). TAM acts by interacting with 
and blocking the action of estrogen on ERa within breast 
tumors and is classified as a selective ER modulator. In 
addition to the inhibitory effects of TAM on proliferation 
of breast ductal epithelium, TAM can act as an agonist to 
maintain bone density and to reduce serum cholesterol and 
triglyceride levels (Osborne et al., 2000). Despite its 
beneficial activities, prolonged administration of TAM 
can increase the incidence of endometrial cancer in 
some postmenopausal women (Wilking et al., 1997). 
The partial agonist properties of TAM have prompted 
the use of Easlodex to substitute for TAM as first- or 
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217 second-line endocrine treatment (Hundred and Howell, 

218 2002; Howell, 2006). A steroidal analogue of 17P- 

219 estradiol, Faslodex (ICI 182,780; Fulvestrant) generally 

220 provides complete anatagonism without agonist effects, 

221 unlike TAM, which acts as a partial agonist. Faslodex 

222 prevents estrogen binding to ERa by inducing conforma- 

223 tional changes and an eventual reduction of cellular ERa 

224 following polyubiquitination; Faslodex is classified as a 

225 QS selective downregulator of ER (SERD) {{74}}. 

226 Antiestrogens primarily function through their ability 

227 to compete with estradiol (E2) for binding to ER and 

228 inducing growth arrest and cell death (Clarke et ah, 2001). 

229 The consequences of occupying ER with an antiestrogen 

230 depend upon cellular context (i.e., expression of other 

231 proteins like ER coreguators), which ER is occupied 

232 (ERa, ERp), and ligand structure (Clarke et ah, 2001). 

233 The importance of ERa is established; that of ERP is less 

234 clear (Speirs et ah, 2004). Higher ERp mRNA levels in 

235 resistant tumors have been reported (Arnold et ah, 1995), 

236 but this cannot be causally linked to endocrine resistance 

237 because ERP can also be associated with an aggressive 

238 phenotype (Dotzlaw et ah, 1999; Speirs et ah, 1999). 

239 Aromatase inhibitors are often classed as type 1 (ster- 

240 oidal inactivator, e.g., Letrozole, Anastrozole), or type 2 

241 (nonsteroidal inhibitor, e.g., exemestane). Third genera- 

242 tion aromatase inhibitors are highly effective in blocking 

243 enzyme activity and exhibit notable specificity (Miller, 

244 2004). These drugs act by blocking the ability of the P450 

245 CYP19A1 gene product (aromatase) to convert androgen 

246 precursors to estrone or estradiol. Estradiol is the most 

247 potent estrogen and is found in high concentrations in 

248 breast tumors irrespective of menopausal status or the ER 

249 status of the tumor (Clarke et ah, 2001; Clarke et ah, 2003). 

250 The extent of crossresistance among endocrine thera- 

251 pies is unclear. Since aromatase inhibitors can improve 

252 disease-free survival after two to three years of TAM as 

253 compared with a total of five years of TAM (Baum et ah, 

254 2003; Boccardo et ah, 2001; Coombes et ah, 2004; Jakesz 

255 et ah, 2005; Thurlimann et ah, 2005), it would seem that 

256 some TAM resistant tumors may retain sensitivity to an 

257 aromatase inhibitor. Second-and third-line responses to 

258 endocrine therapy have been widely documented, but 

259 lower response rates of shorter duration are usually 

260 observed with each successive line of treatment. 

261 Both TAM and Easlodex can induce apoptosis in cells 

262 by inhibiting survival signaling mediated through ERa 

263 QA {{82, 80, 81}}. However, the precise mechanism for 

264 inducing apoptosis remains controversial. Administration 

265 of TAM results in activation of caspases-3, -8, -9 within 

266 18 to 24 hours of drug treatment in rat mammary tumors 

267 (Mandlekar et ah, 2000a). In addition, the effects of TAM 

268 can be mediated through an ERa-independent mechanism 

269 that results in an early activation of JNKl followed by 

270 caspase activation (Mandlekar et ah, 2000b). TAM can 


141 

affect the level of proteins involved in cell growth 
including c-MYC (Kang et ah, 1996), protein kinase 
C (Gelmann, 1996), and transforming growth factor 
p (TGEP) (Perry et ah, 1995) by ERa-independent mech¬ 
anisms. However, since ER-negative tumors rarely 
respond to endocrine therapies (1998;2002a), these mech- Q'b 
anisms are not likely to be responsible for significant 
apoptotic cell death in vivo. 

About 25% of ER-positive/PR-positive tumors, 66% of 
ER-positive/PR-negative tumors and 55% of ER-negative/ 
PR-positive tumors fail to respond to TAM (Clarke et ah, 

2003; Moy and Goss, 2006; Osborne and Schiff, 2003). 

Better predictors of endocrine responsiveness are clearly 
required. Many initially sensitive tumors become resistant 
(acquired resistance) (Clarke et ah, 2001) and about one- 
third of all ER-positive breast tumors exhibit de novo 
endocrine resistance. The mechanisms of resistance to an 
antiestrogen remain unclear, reflecting a limited understand¬ 
ing of the signaling affecting cell proliferation, survival, and 
death and their hormonal regulation in breast cancer cells. 

Of current interest is identification of the optimum 
choice and scheduling of antiestrogens and aromatase 
inhibitors. Evidence clearly shows improvements in over¬ 
all response or disease-free survival for combined therapy 
(an aromatase inhibitor and an antiestrogen usually given 
sequentially) over single agent TAM (Baum et ah, 2003; 
Boccardo et ah, 2001; Coombes et ah, 2004; Jakesz et ah, 

2005; Thurlimann et ah, 2005). Recent data also imply 
that response rates are often greater to a first-line aroma¬ 
tase inhibitor than to TAM (Bonneterre et ah, 2000; QQ 
Mouridsen et ah, 2001). However, the ability of aromatase 
inhibitors to induce a significant improvement in overall 
survival is uncertain. A recent meta-analysis failed to 
show an advantage for aromatase inhibitors with respect 
to overall survival, despite clear evidence favoring aro¬ 
matase inhibitors in other end points (Ferretti et ah, 2006). 

Thus, the optimum first-line endocrine therapy remains 
controversial for some women, as does the optimum 
choice and scheduling of combination endocrine therapy. 
Whichever way these controversies are eventually 
resolved, it is clear that both aromatase inhibitors and 
antiestrogens will remain as key modalities in the man¬ 
agement of ER-positive breast cancers. 

Understanding the signaling mechanism involved in 
antiestrogen-induced apoptosis is closely related to our 
knowledge of antiestrogen resistance. Through ongoing 
research in our laboratory, we have established interferon 
regulatory factor-1 (IRE-1) as a key node in a putative 
signaling network associated with responsiveness to endo¬ 
crine therapy, where IRF-1 modifies ERa-mediated sig¬ 
naling to apoptosis (Bouker et ah, 2004; Clarke et ah, 

2003; Gu et ah, 2002). IRF-1 and a dominant negative 
IRF-1 (dnIRF-1) induce opposing effects on proliferation 
in vitro and tumorigenesis in vivo through regulation of 
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caspase-3/7 and caspase-8 activities (Bouker et al., 2005). 
While TP5 3-dependent apoptosis occurs in the breast 
(Tu et al., 2005), T47D cells express mutant TP53 and 
our data show that TP53 is not required for the proapoptotic 
actions of IRF-1 (Bouker et al., 2004; Bouker et al., 2005). 
Expression of lRF-1 is reduced in neoplastic versus normal 
human breast (Doherty et al., 2001), and there is an inverse 
correlation between IRF-1 and tumor grade (Connett et al., 
2005). In a study of mostly ERa-positive breast tumors, 
nuclear expression of IRF-1 is negatively correlated with 
NFkB expression, suggesting their expression pattern to be 
consistent with other genes implicated in our signaling 
network for endocrine resistance (Zhu et al., 2006). 

Upregulation of NFkB is associated with E2- 
independence (Clarkson and Watson, 1999; Nakshatri 
et al., 1997) and antiestrogen resistance (Gu et al., 2002; 
Riggins et al., 2005). The NFkB p50/p65 heterodimer 
complex comprises two homologous proteins encoded by 
different genes; the p50 product of its pi05 precursor 
(NFkBI) and NFkB p65 (RELA). While the predominant 
form in breast cancer cell lines is NFkB (p50/p65), 
another member of the family (p52) also is expressed in 
some breast cancers (Cogswell et al., 2000). Perhaps 
reflecting its regulation by both E2 and growth factors 
(Biswas et al., 2000; Nakshatri et al., 1997), which are 
also involved in endocrine resistance (Clarke et al., 2001; 
Dickson and Lippman, 1995), normal mammary gland 
development appears to be dependent on NFkB (Clarkson 
and Watson, 1999). NFkB is maintained in the cytosol in 
an inactive state, e.g., complexed with members of the IkB 
family that either inhibit nuclear transport or block 
NFkB’s nuclear translocation signal (Tam and Sen, 
2001). Generally, activation proceeds by phosphorylation 
of IkB by the IKK kinase complex, which results in the 
ubiquitination and degradation of IkB (Yaron et al., 1998). 
Elevated NFkB activity arises during neoplastic transfor¬ 
mation in both the rat (Kim et al., 2000) and mouse 
mammary gland (Tonko-Geymayer and Doppler, 2002). 

Antiestrogens, Aromatase Inhibitors, Autophagy, 
and the Unfoided Protein Response 

While apoptosis is clearly implicated (Bouker et al., 2004; 
Gaddy et al., 2004; Kyprianou et al., 1991), some of the 
apoptosis end points in prior studies may not distinguish 
among earlier events that are more closely linked to 
signaling initiated through autophagy. Autophagy does 
occur in response to endocrine therapy (Bursch et al., 
1996; Inbal et al., 2002). However, given very recent 
advances in the understanding of cell death mechanisms, it 
is not clear if signaling to both apoptosis and autophagy 
are involved in regulating cell survival, and/or if the initial 
signaling involves autophagy but later events include 
signaling to apoptosis. 


One cell signaling process that may integrate auto¬ 
phagy and apoptosis in this context is the unfolded protein 
response (UPR), a key component of the endoplasmic 
reticulum stress response (Ron, 2002) and an adaptive 
signaling pathway that allows cells to survive the accu¬ 
mulation of unfolded proteins in the endoplasmic retic¬ 
ulum (Zhang and Kaufman, 2006). Initially a mechanism 
for allowing cells to recover normal endoplasmic retic¬ 
ulum function, prolonged UPR can induce cell death. UPR 
is activated by three molecular sensors: IREla, ATF6, and 
PERK (DuRose et al., 2006). Splicing of X-box binding 
protein 1 (XBPl) by IREla is obligatory for IREla- and 
ATF6-induced UPR (DuRose et al., 2006; Yoshida et al., 
2001). XBPl is a transcription factor; the unspliced form, 
XBPl(U), has a molecular weight (Mr) of approximately 
33 kDa and acts as a dominant negative (Fee et al., 2003; 
Sriburi et al., 2004). The spliced, active form, XBPl(S), 
has a Mr of approximately 54 kDa. A very recent study 
shows that the UPR (initiated by XBPl splicing) can 
induce autophagy (Ogata et al., 2006). Whether this is a 
prosurvival or prodeath form of autophagy is unknown, 
since UPR can induce both prodeath and prosurvival 
outcomes (Feldman et al., 2005). In MCF7 and T47D 
breast cancer cells, we have shown that overexpression of 
XBPl(S) prevents antiestrogen-induced cell cycle arrest 
and cell death via the mitochondrial apoptotic pathway. 
Therefore, XBPl may be a useful molecular target for the 
development of novel predictive and therapeutic strategies 
in breast cancer (Gomez et al., 2007, in press). Figure 3 
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Figure 3 Some of the signaling mechanisms involved in 
endocrine resistance in breast cancer cells. Abbreviations: Inter¬ 
feron regulatory factor 1, IRF-1; nuclear factor kappa B, NFkB; 
X-box binding protein 1, XBPl; unfolded protein response, 
UPR; B-cell CLL/lymphoma 2, BCL2; beclin 1, BECNl. 
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shows some of the signaling mechanisms involved in 
endocrine resistance in breast cancer, based on the research 
done by others and our group. 

CYTOTOXIC THERAPIES 
Anthracycllnes 

Anthracyclines such as Doxorubicin are widely used in 
breast cancer treatment for the treatment of metastatic 
breast cancer. The mechanisms for the antineoplastic activ¬ 
ities of doxorubucin are complex and include intercalation 
with DNA, direct cell membrane effects, initiation of DNA 
damage, apoptosis through inhibition of topoisomerase II, 
and the production of reactive oxygen species (Minotti 
et al., 2004). Despite its efficacy. Doxorubicin has several 
undesirable side effects, especially a cumulative cardiac 
toxicity (Singal and Iliskovic, 1998) (Table 1). 

Doxorubicin-induced apoptosis occurs through a dif¬ 
ferent signal transduction mechanism in nontransformed 
cells, such as endothelial cells and cardiomyocytes (H 2 O 2 - 
dependent), when compared with tumor cells that harbor a 
functional TP53 (Wang et al., 2004). Thus, targeted drug 
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therapy could minimize cytotoxicity in normal cells and 
maximize cell death in cancer cells. The NF-kB/BCL 2 
pathway is a possible mechanism for tumor resistance to 
anthracyline-based chemotherapy. In breast tumor sam¬ 
ples from patients treated with neoadjuvant Doxorubicin- 
based chemotherapy, nuclear localization of NF-kB is 
associated with expression of BCL2 and BAX. Moreover, 
the NF-kB/BCL 2 pathway may be associated with a poor 
response to neoadjuvant Doxorubicin-based chemotherapy 
(Buchholz et al., 2005). 

Alkylating agents 

Cyclophosphamide (CPA) is an alkylating agent that 
alkylates DNA, forming DNA-DNA cross-links that result 
in an inhibition of DNA synthesis and cell death. CPA is a 
prodrug that is metabolically activated in the liver by 
cytochrome P450 enzymes. Activated CPA metabolites, 
e.g., hydroxyl-CPA, are transported via the bloodstream 
to both tumor and healthy tissues, where DNA and 
protein damage can occur (Moore, 1991). Cisplatin (cis- 
diaminedichloroplatinum II), another alkylating agent, 
exerts its cytotoxic effects by reacting with DNA to 


Table 1 


Drug Primary target Selected cell death signaling 


Endocrine agents 

Faslodex, Tamoxifen Estrogen receptor a 


Anastrazole, Letrozole 

CYP19A1 aromatase 

Anthracyclines 

Doxombicin, Epirubicin 

DNA intercalation, 
topoisomerase 11 

Alkylating agents 

Cisplatin, Cyclophosphamide 

DNA crosslinking 

Antimetabolites 

5-fluorouracil, Capecitabine 

Thymidylate synthase 

Microtubule inhibitors 

Docetaxel, Paclitaxel 

Microtubule stabilization 

Vinorelbine, Vincristine 

Microtubule dissolution 

Signal transduction inhibitors 

Gefitinib 

EGER kinase activity 

Trastuzumab, CH401 

HER2 extracellular domain 

Dasatinib, AZD0530 

c-Src kinase activity 

Genasense 

BCL2 

ABT-737, others 

BCL2 and BCLX 

Bortezomib 

26S proteasome 

Bevacizumab 

VEGFR2 


Caspase activation, BCL2 downregulation, INK 
activation caspase activation, BAK upregulation 


BCL2 family regulation, NFkB inhibition, TP53 
activation 

Caspase activation, TP53 activation, cytochrome c release 

TP53 activation, thymineless death 

Caspase activation, phosphorylation of BCL2 and BCLX, 
INK activation, CD95/FAS expression 
TP53 activation, posttranslational modification of BCL2 
family members 

Phosphorylation of BAD, downregulation of BCL2 
Inhibition of P13K/AKT, phosphorylation of BAD, 

INK activation 

Inhibition of P13K/AKT, downregulation of BCLX 
Downregulation of BCL2 
Prevention of BCL2 and BCLX interaction with 
proapoptotic BAX and BAK 
Inhibition of P13K/AKT, INK activation, sensitization 
to TRAIL 

DNA fragmentation, inhibition of P13K/AKT 
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yield a variety of adducts, the most common adduct being 
an intrastrand cross-link between adjacent guanines (Perez, 
1998). Cisplatin can be administered as a first-line therapy 
for metastatic breast carcinoma (Sledge Jr., et al., 1988) 
or in combination with other chemotherapeutic drugs 
(Kourousis et al., 1998; Nagoumey et al., 2000). 

Besides targeting genomic DNA, Cisplatin can also 
bind to mitochondrial DNA, interact with phospholipids 
and phosphatidylserine in membranes, disrupt the cytos- 
keleton, and affect the polymerization of actin (Jamieson 
and Lippard, 1999). Thus, Cisplatin interaction with 
proteins is also a possible mechanism for Cisplatin- 
induced apoptosis (Perez, 1998). Cisplatin can activate 
caspases by stabilizing TP53 and releasing cytochrome c 
from mitochondria (Siddik, 2002). Cell death by both 
apoptosis and necrosis has been found in the same pop¬ 
ulation of ovarian cancer cells treated with Cisplatin 
(Pestell et al., 2000). Studies in MCF7 breast cancer cell 
have shown that antisense BCL2 and Cisplatin combina¬ 
tion therapy could potentially be useful in treating breast 
cancer overexpressing BCL2, perhaps by activating 
caspase-8 independent of TP53 status (Basma et al., 
2005). High doses of Cisplatin (>312 pM) can damage 
molecules involved in cellular energy supply (such as 
ATP) or proteins involved in apoptosis (such as TP53, 
caspases, BCL2, and BAX), leading to necrotic cell death. 
Thus, dose of cisplatin or the context of the target cells 
could direct the mode of cell death either by a defective 
apoptotic program or by necrosis (Gonzalez et al., 2001). 
Mechanisms of resistance to Cisplatin include loss of 
damage recognition, overexpression of HER-2/neu, activa¬ 
tion of the PI3K/AKT (also known as PI3K/PICB) pathway, 
loss of TP53 function, overexpression of BCL2, and inter¬ 
ference in caspase activation (Siddik, 2002). 

Antimetabolites 

Antimetabolites, particularly the fluoropyrimidine 5- 
fluorouracil (5-FU), are widely used as chemotherapeutic 
agents in the treatment of breast cancer. Within the cell, 
5-FU is metabolized to 5-fluoro-deoxyuridine monophos¬ 
phate (FdUMP) that interacts with and inhibits thymidy- 
late synthase (TS) and prevents the formation of 
thymidine 5'-monophosphate (dTMP), thus inhibiting 
DNA synthesis (Chu et al., 2003; Longley et al., 2003). 
Thymidine phosphorylase (TP) mediates the conversion of 
5-FU into fluorouridine diphosphate (FUDP) by transfer 
of a ribose phosphate from phosphoribosylpyrophosphate 
(PRPP) performed by orotic acid phosphoribosyltransfer- 
ase (OPRTase). FUDP can be phosophorylated to FUTP 
and incorporated into RNA-by-RNA polymerase. FdUMP 
is converted to fluorodeoxyuridine triphosphate (FdUTP) 
and is directly incorporated into DNA (Chu et al., 2003; 
Longley et al., 2003). 
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The cytotoxic effect of 5-FU can occur through a 
process called thymineless death (Houghton et al., 
1997). At the level of DNA synthesis, FdUMP acts as a 
competitive inhibitor of TS that results in a dTTP/dUMP 
cellular pool imbalance with subsequent DNA damage 
(Hernandez-Vargas et al., 2006; Parker and Cheng, 1990). 
Breast cancer patients with high pretreatment levels of TS 
protein show increased response to 5-FU-based chemo¬ 
therapy (Nishimura et al., 1999). In addition, inhibition of 
RNA metabolism (Longley et al., 2003) and interference 
with polyamine metabolism (Zhang et al., 2003) could 
also contribute to the antiproliferative effects of 5-FU. In 
response to 5-FU treatment of MCF7 breast cancer cells, 
increased expression of TP53 target genes that are 
involved in cell cycle and apoptosis including CDKNIA 
(CIPl, WAFl), TP531NP, CD95/FAS, and BBC3/PUMA, 
along with significant repression of MYC (Hernandez- 
Vargas et al., 2006). Low doses of 5-FU (IC 50 , 10 pM) 
result in cell cycle arrest, while high doses (ICgo, 500 pM) 
result in apoptosis in breast cancer cells (Hernandez-Vargas 
et al., 2006). Thus, the TP53 status of tumors and tissue 
concentration of 5-FU could be important determinants of 
drug efficacy in breast cancer treatment. Capecitabine is an 
example of a rationally designed cytotoxic drug that gen¬ 
erates 5-FU preferentially in tumor cells by exploiting their 
higher activity of the activating enzyme TP compared with 
healthy tissues. The use of such targeted therapies increases 
efficacy and minimizes toxicity. Capecitabine has good 
activity and a favorable safety profile when used for the 
treatment of metastatic breast cancer (Yarden et al., 2004). 

Microtubule Inhibitors 

Microtubules form an integral part of the cytoskeleton and 
consist of a- and (3- tubulin heterodimers. Taxanes (e.g., 
Taxol/Paclitaxel, Taxotere/Docetaxel) and vinca alkaloids 
(e.g.. Vincristine, Vinorelbine) target microtubules and 
are often used to treat breast cancer. However, these drugs 
have very distinct effects on microtubule stability. While 
taxanes promote tubulin polymerization and stabilization 
(Schiff et al., 1979), vinca alkaloids inhibit tubulin poly¬ 
merization (Johnson et al., 1960). Cells can die through 
either apoptotic or nonapoptotic pathways, but the precise 
signaling pathways involved are not completely elucidated. 

Paclitaxel induces aberrant mitotic spindle formation 
(Fuchs and Johnson, 1978) that results in G2-M cell cycle 
arrest followed by cell death. At concentrations lower than 
those required for microtubule disruption {2-A nM), tax¬ 
anes can inhibit cell growth, implying the existence of 
alternative mechanisms of action (Ganansia-Leymarie 
et al., 2003; Hernandez-Vargas et al., 2007). Several 
proapoptotic signaling mechanisms are implicated in 
response to antimicrotubule agents that appear independent 
of microtubule binding, such as phosphorylation of and 
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binding to BCL2 /BCLXl antiapoptotic proteins (Haidar 
et al., 1997), activation of jun N-terminal kinase (INK) 
(Wang et al., 1998) and RAFl kinase (Blagosklonny et al., 
1996). Other apoptosis-related proteins including TP53, 
CDKNIA, and BAX, and caspases are involved in apoptotic 
cell death process induced hy taxanes (Ganansia-Leymarie 
et al., 2003). Phosphorylation of caveolin-1, a regulator of 
proapoptotic proteins, increases Paclitaxel sensitivity in 
MCF7 breast cancer cells by facilitating BCL2 phosphor¬ 
ylation and regulating the induction of CDKNIA (Shajahan 
et al., 2007). The death receptor CD95/FAS, a mediator of 
apoptosis, is induced after taxane therapy (Hernandez- 
Vargas et al., 2007). Treatment of hreast cancer cells with 
low levels of Docetaxel (2^ nM) triggers necrosis, while a 
higher concentration of the drug (100 nM) induces apoptosis 
(Hernandez-Vargas et al., 2007). Moreover, cell death by 
autophagy has been described as a form of PCD following 
treatment with taxanes in breast cancer cells (Gorka et al., 
2005). Thus, both apoptotic and nonapoptotic pathways are 
likely responsible for cell death in response to taxanes. 

Vinorelhine is an antimitotic drug that impairs chromo¬ 
somal segregation during mitosis hy blocking cell cycle at 
G2/M phase. Like other mitotic poisons, vinorelhine can 
induce apoptosis in cancer cells but an understanding of the 
signaling mechanism(s) of cell death mediated by vinca 
alkaloids is incomplete. Disruption of microtubules by 
vinca alkaloids can cause induction of TP53 and regulation 
of a number of proteins involved in apoptosis including 
CDKNIA, RAS/RAF, and PKC/PKA (Wang et al, 1999). 
As is the case with taxanes, BCL2 phosphorylation/ 
inactivation plays a role in apoptosis induction (Haidar 
et al., 1995), resulting in a decrease in BCL2 inhibition 
of the proapoptotic protein BAX (Wang et al., 1999). 


TARGETED INHIBITORS OF SIGNALING 
TRANSDUCTION PATHWAYS 

In addition to the conventional cytotoxic and endocrine 
therapies discussed above, novel signal transduction 
inhibitors are emerging as valuable tools for inducing 
breast cancer cell death in vitro and in vivo. The precise 
molecular targets of these agents are varied, ranging from 
cell surface receptors to components of the proteasomal 
degradation pathway (recently reviewed in (Bremer et al., 
2006). The purpose of these agents is to induce death in 
malignant cells while leaving normal tissue relatively 
unaffected, a goal that is potentially achievable if the 
signaling pathways contributing to malignant transforma¬ 
tion are known. Given that a comprehensive analysis of all 
such agents as they pertain to breast cancer therapy is 
beyond the scope of this review, we will instead focus on 
three broad classes of molecules and their targeted thera¬ 
pies that have shown promise in the clinic. 


145 

The epidermal growth factor receptor (EGFR) and 
its family member HER2 are important mediators of 
breast cancer cell proliferation and survival [reviewed in 
(Badache and Gonsalves, 2006)]. Overexpression of 
EGER and HER2 are associated with poor prognosis. 
While HER2 is amplified in approximately one-third of 
breast cancers, it is rare for either EGFR or HER2 to 
exhibit activating mutations (Diehl et al., 2007). Other 
receptors, such as insulin-like growth factor 1 receptor 
(IGF-IR), TGFP receptor, and vascular endothelial 
growth factor receptor (VEGFR; see below) are additional 
targets that are receiving increased attention for their 
potential in the treatment of breast cancer. 

Downstream of growth factor receptors, many intra¬ 
cellular signaling molecules coordinate the aberrant pro¬ 
growth and prosurvival signals that lead to tumorigenesis. 
Other gene products are coupled to death receptors that 
transduce proapoptotic signals, while still others regulate 
cell proliferation and survival independent of growth 
factor receptors. The nonreceptor tyrosine kinase c-Src 
(Src) is overexpressed in approximately 70% of breast 
cancers, and the associated increase in its activity con¬ 
tributes significantly to tumor cell survival (recently 
reviewed in (Ishizawar and Parsons, 2004). The expres¬ 
sion of intracellular signaling partners for TNF, FAS, 
TRAIL, and other cell surface death receptors may also 
be reduced, leaving cancer cells unable to respond to 
extrinsic apoptotic signals. More globally, deregulated 
expression of apoptotic gatekeepers like BCL2, or the 
ubiquitin-proteasome pathway, can disrupt the entire cell 
death program (Bremer et al., 2006). 

In addition to the targeted therapies that directly disrupt 
cancer cell proliferation, others are designed to inhibit a 
tumor’s ability to grow and spread by other means. Solid 
tumors such as breast cancer are particularly dependent on 
the formation of new blood vessels and capillary net¬ 
works, both to supply fresh oxygen and nutrients and 
remove metabolic waste. The generation of new vascular 
tissue is known as angiogenesis; VEGF strongly induces 
angiogenesis by binding to the VEGFR2 expressed on 
endothelial cells. Many cancers, including those of the 
breast, overexpress VEGF, and agents that block angio¬ 
genesis via this signal transduction pathway have the 
capacity to induce apoptosis by starving the tumor 
[reviewed in (Schneider and Sledge, Jr., 2007). 


Growth Factor Receptor Inhibitors 

Preclinical studies of growth factor receptor inhibitors 
have revealed the potential of these agents to induce cell 
death by various means. The kinase inhibitor Gefitinib 
(ZD1839, Iressa) is an antagonist for EGFR but also has 
some activity toward HER2, and has been shown to 
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induce apoptosis in breast cancer cell lines through the 
mitochondrial or intrinsic apoptotic pathway. These 
effects arise either by decreased phosphorylation of proa- 
poptotic BAD [reviewed in (Motoyama and Hynes, 
2003)], or by down regulation of prosurvival BCL2 
(Okubo et al., 2004). Okubo et al. also show that Gefitinib 
enhances apoptosis induced by the steroidal antiestrogen 
Faslodex in ER-positive breast cancer cell lines. Others 
have reported that Gefitinib can restore antiestrogen sen¬ 
sitivity in MCF7/HER2 xenografts that have become 
resistant to either Faslodex or estrogen deprivation 
(Massarweh et al., 2006). EGFR inhibition also has the 
potential to enhance or restore sensitivity to other chemo¬ 
therapeutic agents. MCF7/ADR breast cancer cells are 
resistant to multiple drugs because of overexpression of 
the transport pump gpl70/MDR7, and they have also been 
shown to express high levels of EGFR and its ligand 
TGFa (Ciardiello et al., 2002). However, Gefitinib is able 
to restore Paclitaxel- and Docetaxel-mediated apoptosis in 
MCF7/ADR cells, despite persistent expression of the 
transporter. Gefitinib may also prove useful in the treat¬ 
ment of high-risk women with ductal carcinoma in situ 
(DCIS); ER-positive and ER-negative DCIS grown as 
subcutaneous xenografts in nude mice are both sensitive 
to growth inhibition and the induction of apoptosis by this 
drug (Chan et al., 2002). 

Inhibition of HER2 using the targeted monoclonal anti¬ 
body Trastuzumab can also induce apoptosis in breast cancer 
cells through blocking HER2-mediated phosphoinositol 
3-kinase (PI3K) and AKT activity. HER2-induced PI3K/ 
AKT activity would normally promote cell survival by 
phosphorylating and inhibiting the proapoptotic functions 
of BAD (Badache and Gonsalves, 2006; Meric-Bemstam 
and Hung, 2006; Zhou and Hung, 2003). In addition, a novel 
inhibitory antibody targeted toward HER2 (CH401) has 
been shown to not only inhibit PI3K/AKT activities but 
also to directly induce apoptosis through modulation of INK 
and p38 in gastric cancer cells (Hinoda et al., 2004). Whether 
this antibody will show similar activity toward HER2 in 
breast cancer remains to be determined. 

Other growth factor receptor inhibitors have shown 
promise in both in vitro and preclinical studies. IGE-IR 
regulates cell proliferation, survival, and migration in 
multiple cancer models, using intracellular signal trans¬ 
duction pathways similar to those utilized by EGFR and 
Q9 HER2 (e.g., PI3K/AKT and RAS/MAPK) [recently 
reviewed in (Sachdev and Yee, 2007)]. However, unlike 
these receptors, IGF-IR signaling events cannot be 
induced by overexpression of the receptor and are exclu¬ 
sively ligand dependent. Therefore, strategies for inhibi¬ 
tion of this pathway focus on blocking expression and/or 
function of the IGF-I ligand and its receptor. Humanized, 
single-chain antibodies directed against IGF-IR can 
induce downregulation of receptor expression in MCF7 


breast cancer cells in vitro and grown as xenografts 
(Maloney et al., 2003; Sachdev et al., 2003). Over-expression 
of signal-inhibitory IGF binding proteins is also reported to 
inhibit IGF-IR signaling and cell growth while inducing 
breast cancer cell death [reviewed in (Maloney et al., 2003; 
Sachdev and Yee, 2007)]. 

Finally, TGFP signaling represents another promising 
target for specific inhibition in breast cancer. It is well 
established that the type 1 TGFP receptor can activate 
PI3K survival pathways and promote mammary epithelial 
cell survival (Muraoka-Cook et al., 2006; Yi et al., 2005). 
In the context of overexpressed HER2, TGFpl potently 
stimulates cell migration (Ueda et al., 2004). In addition, 
Arteaga et al. have shown that inhibitory antibodies to 
TGFP2 can prevent the growth of established MCF7/ 
LCC2 TAM-resistant xenografts (Arteaga et al., 1999). 
A number of approaches have been explored for inhibiting 
the TGFP pathway, including small-molecule inhibitors, 
antisense oligonucleotides, and monoclonal antibody ther¬ 
apy (recently reviewed in (Lahn et al., 2005). One of these 
compounds, LY-580276, has been shown to significantly 
inhibit tumor development in MXl xenografts, a model of 
ER-negative/PR-negative breast cancer. More recently, 
the TGFP receptor kinase inhibitor SB-431542 has been 
demonstrated to abrogate epithelial-to-mesenchymal tran¬ 
sition and wound healing in mouse mammary epithelial 
cells and the ER-negative/PR-negative MDA-MB-231 
breast cancer cell line (Haider et al., 2005). Cell prolifer¬ 
ation, as measured by [^H] thymidine incorporation, is 
also inhibited under these conditions. However the molec¬ 
ular mechanism of cell death induced by these agents is 
unknown, and it is also important to consider that in many 
cell types TGFP signaling has growth-inhibitory rather 
than growth-promoting effects. 

Intracellular Signaling Inhibitors and Activators 

Growth factor and other cell surface receptors can use 
diverse intracellular signaling partners to control cell growth 
and apoptosis. Receptors on the surface of a tumor cell can 
make for a relatively easy target for molecular inhibition. 
However, if the normal function and regulation of the 
intracellular signaling network(s) upon which these recep¬ 
tors rely is also disrupted, inhibiting only the receptor will be 
often ineffective. For example, altered downstream signal¬ 
ing may explain, in part, why the in vivo efficacy of EGFR 
and HER2 inhibitors has been mixed (Johnston, 2006). The 
use of agents designed to target intracellular signaling 
partners such as c-Src, BCL2, components of the protea- 
some, and transcription factors such as signal transducer 
and activator of transcription (STAT), STAT3 and 
STAT5, is likely to improve our ability to selectively 
induce apoptosis in breast cancer, perhaps in combination 
with receptor-targeted therapies. 
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541 c-Src’s overexpression in a large percentage of breast 2007; Tolcher et al., 2004). More recently, small molecule 

542 cancers makes it an attractive target for therapeutic inter- inhibitors of BCL2 function have been developed that are 

543 vention [reviewed in (Ishizawar and Parsons, 2004)]. It is designed to disrupt its interaction with (and inhibition of) 

544 well established that EGFR and c-Src kinase activities proapoptotic family members BAK and BAX. HA14-1 

545 synergistically promote breast cancer cell growth and and YC137 are early-generation BCL2 inhibitors that 

546 survival (Biscardi et ah, 2000). c-Src activity can also have been shown to induce apoptosis alone and in com- 

547 be regulated by physical and functional interactions with bination with other drugs in breast cancer cell lines (Real 

548 other proteins that include pl30Cas (Burnham et ah, et ah, 2004; Witters et ah, 2007). A newer BCL2/BCLX 

549 2000), and this has recently been shown to play a critical inhibitor (ABT-737) that has recently been developed may 

550 role in antiestrogen resistance in breast cancer (Riggins also prove useful in the sensitization of breast cancer to 

551 et ah, 2006), in part through attenuating TAM-induced apoptosis induced by chemotherapeutic drugs (Dai and 

552 apoptosis. Despite these findings, there are currently no Grant, 2007). 

553 c-Src-specific inhibitors in clinical use for the treatment of The ubiquitin-dependent proteasome degradative path- 

554 breast cancer. Dasatinib (BMS-354825) is a dual inhibitor of way plays a critical role in maintaining appropriate cellular 

555 c-Src and the Abl kinase which has recently been approved function by regulating the stability of signaling proteins. 

556 for use in chronic myelogenous leukemia (CML), and this The classical pathway relies on conjugation of multiple 

557 agent has also recently been shown to inhibit the growth of ubiquitin moieties to the target protein, which is then 

558 “triple negative” breast cancer cell lines that lack expression degraded by the 26S proteasome. In contrast, monoubi- 

559 of ER, PR, and EGER, a subset of tumors that is often quitination appears to regulate endocytosis and/or traf- 

560 difficult to treat (Einn et ah, 2007). AZD0530 is another ficking within the nucleus [reviewed in (Ohta and Eukuda, 

561 dual c-Src/Abl inhibitor that has been shown to inhibit 2004)]. Both of these processes are active in breast cancer, 

562 breast cancer cell growth and motility in vitro (Hiscox and many important mediators of breast cancer cell prolif- 

563 et al., 2006). In models of lung cancer and CML, Dasatinib eration and apoptosis are targets of the ubiquitin-proteasome 

564 can reduce AKT activity and expression of the prosurvival pathway, including EGFR and ER (Marx et al., 2007). 

565 protein BCLX (Nam et ah, 2007), although it is not yet Consequently, there is considerable interest in studying the 

566 known whether this also occurs in breast cancer. ability of proteasome inhibitors for the treatment of breast 

567 Growth factor receptors and c-Src share the STAT cancer (Dees and Orlowski, 2006). Bortezomib (PS-341, 

568 family of proteins, specifically STAT3 and STAT5 as Velcade) is a small-molecule peptide mimetic that inhibits 

569 QIO targets. These transcription factors regulate apoptosis in the 20S core of the proteasome that has been approved 

570 normal mammary gland development (Clarkson et al., for the treatment of multiple myeloma (Voorhees and 

571 2006) and therefore are important players in breast cancer. Orlowski, 2006). In SKBR3, MDA-MB-453, MCF7 

572 However, their lack of enzymatic and ligand binding and MCF7/HER2 breast cancer cell lines, Bortezomib 

573 activities makes them difficult to target. Multiple alterna- can enhance apoptosis induced by the HER2 inhibitor 

574 tive approaches have been considered for the targeted Trastuzumab (Cardoso et ah, 2006) and downregulate 

575 inhibition of STAT3 and STAT5, including small hairpin AKT activity, while stimulating apoptotic INK activity, 

576 RNA, peptide mimetics that prevent binding to signaling in combination with the multikinase inhibitor sorafenib 

577 partners, and small molecules such as sulindac and cyclo- in MDA-MB-231 breast cancer cells (Yu et ah, 2006). 

578 oxygenase inhibitors [reviewed in (Desriviereset ah, 2006)]. Brooks et al. have shown that Bortezomib can sensitize 

579 The prosurvival protein BCL2 and components of the some breast cancer cell lines to apoptosis induced by 

580 ubiquitin-proteasome system represent two signaling path- TRAIL, but this does not occur in all breast cancer models 

581 ways that are essential to the proper regulation of apop- (Brooks et ah, 2005). In a phase I study of Bortezomib in 

582 tosis. Like STAT3/5 these also lack enzymatic activity, 12 patients with metastatic breast cancer, the drug showed 

583 making them more difficult to target in the treatment of minimal toxicity but was not efficacious in either stabiliz- 

584 hreast and other cancers. However, inhibition of BCL2 by ing disease or improving outcome (Yang et ah, 2006). It is 

585 the antisense oligonucleotide Genasense (ohlimersen likely that this, and other signal transduction inhibitors, 

586 sodium, G3139) has been successful in improving the will need to be used in combination with other therapeu- 

587 sensitivity of breast cancer preclinical models to apoptosis tics to be a viable treatment option for breast cancer. 

588 induced by conventional chemotherapies [reviewed in 

589 (Nahta and Esteva, 2003)], even in models of multiple- Antiangiogenic Agents 

590 drug-resistant breast cancer (Lopes de Menezes et ah, 

591 2003). Genasense has also been studied in phase I clinical Specific inhibition of VEGFR2 is a common approach for 

592 trials in combination with standard chemotherapeutics in targeting angiogenesis in solid tumors, including those of 

593 many types of cancer including breast (Marshall et ah, the breast. VEGFR2 is expressed in the vasculature, where 

594 2004) and hormone-refractory prostate cancer (Kim et ah, proliferation during the process of angiogenesis can be 
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stimulated in a paracrine manner by VEGF expressed by 
the tumor. Bevacizumab (Avastin) is a humanized mono¬ 
clonal antibody directed against the VEGF ligand and the 
pursuit of similar angiogenesis inhibitors is an active 
focus of clinical breast cancer research [reviewed in 
(Schneider and Sledge Jr., 2007)]. Bevacizumab alone or 
in combination with Doxorubicin significantly inhibits 
VEGFR2 activation and angiogenic measures such as 
vascular permeability in women with inflammatory breast 
cancer (Wedam et al., 2006), and has been shown to 
significantly improve disease-free survival when com¬ 
bined with Paclitaxel in the treatment of metastatic breast 
cancer (Miller et al., 2005). Wedam et al. also showed that 
Bevacizumab induces apoptosis in vivo, as measured by 
terminal deoxynucleotidyl transferse-mediated dUTP 
nick-end labeling (TUNEL) staining (Wedam et al., 
2006). In an in vitro model of lung tumorigenesis, Bev¬ 
acizumab reduces AKT phosphorylation and activity 
(Inoue et al., 2007), although whether this occurs in the 
context of breast cancer is unknown. 


SUMMARY AND CONCLUSIONS 

We are now beginning to more clearly understand the 
molecular mechanisms of chemotherapy, endocrine ther¬ 
apy, and signal transduction inhibitor action and specifi¬ 
cally how these compounds regulate tumor cell death. 
However, several key challenges remain. How we address 
these issues will directly affect our ability to be successful 
in the clinical management of breast cancer and in improv¬ 
ing the outlook of women diagnosed with this disease. 

One difficulty that is encountered is the heterogeneity 
among tumors and the multiple ways by which tumor cells 
can die in response to chemotherapeutic drugs. It is hoped 
that advancement of our knowledge of the molecular mech¬ 
anisms of these drugs along with a greater understanding of 
tumorigenesis will enable more individualized treatment. In 
future, drug development and design must take into account 
the comprehensive cellular signaling mechanisms of inhibi¬ 
tion of the target and the likely mechanisms by which 
resistance can develop to these drugs. In this regard, advan¬ 
ces in biotechnology and bioinformatics will facilitate 
researchers and clinicians to assess high-throughput data 
gathered from DNA or proteomic arrays and tissue micro¬ 
arrays to enable molecular profiling of a patient’s tumor. 
Thus, individual patients can be given a specific dose and 
type of chemotherapeutic drug that will increase efficacy 
and reduce unwanted toxicides. 

A second challenge is redefining tumor cell death in the 
clinical setting. A reduction in tumor size is often consid¬ 
ered evidence of a therapeutic agent’s ability to induce 
cancer cell death. However, tumor shrinkage alone does not 
provide an understanding of the molecular mechanism(s) 
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that control cell death in this context, and the signal 
transduction pathways that operate in vitro may not always 
be active in vivo. For many of the therapies discussed 
above, we do not yet understand whether the in vitro and in 
vivo cell death mechanisms of action are the same. This is 
particularly true for the newer, more specific signal trans¬ 
duction inhibitors. For example, when used in the neo¬ 
adjuvant setting Trastuzumab can induce apoptosis in 
breast tumors, as measured by reduced AKT phosphoryla¬ 
tion and increased cleavage of caspase-3 (Mohsin et al., 
2005). In addition, Bevacizumab can induce DNA fragmen¬ 
tation in metastatic breast cancer (Wedam et al., 2006). 
However as more phase I/II clinical trials incorporate bio¬ 
marker discovery into their design, we anticipate that these 
studies will generate important new data that clarify the in 
vivo apoptotic effects of these signal transduction inhibitors 
and other chemotherapeutic agents in breast cancer. 
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Abstract 

Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression 
profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically 
meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely 
knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then 
identify biologically relevant biomarkers from microarray data. 

Results: Since gene expression levels reflect the joint effect of several underlying biological functions, disease- 
specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers 
that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from 
multiple data sources to provide important information on the likely functions (such as gene ontology 
information) and regulatory events (such as promoter responsive elements) associated with potential genes of 
interest. The gene expression and biological meta data associated with the members of the KGP can then be used 
to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting 
the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted 
connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to 
evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. 
We applied the proposed method to yeast cell cycle microarray data and Rsf-1 -induced ovarian cancer microarray 
data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory 
modes and outperform several baseline methods for biomarker identification. 

Conclusion: We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease- 
specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding 
biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling 
experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related 
biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for 
ovarian cancer and extend current knowledge. 
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Background 

Under their broadest definition, biomarkers include any 
biological or chemical indicator of a specific underlying 
process. In genetics, biomarkers are defined as a set of 
genes that are associated with a disease or are associated 
with the susceptibility to develop a specific disease. Micro¬ 
array technology makes it possible to measure simultane¬ 
ously the expression levels of thousands of genes, and 
identifying meaningful and useful biomarkers from these 
large data sets is a common goal. Specifically, investiga¬ 
tors attempt to detect genes differentially expressed across 
different types of tissue samples or the samples obtained 
under different experimental conditions. Traditional 
biomarker identification methods have mainly been 
applied to statistical analysis of microarray data alone; T- 
test [1] and significance analysis of microarray (SAM) [2] 
are frequently used to detect differentially expressed genes 
between two phenotypes. Several new statistical methods 
have been developed to analyze time-course microarray 
data. Storey et al. proposed an algorithm (EDGE) to fit the 
time-course microarray data with natural cubic splines, 
followed by a goodness-of-fit test to detect differentially 
expressed genes [3]. Conesa et al. also proposed a two-step 
regression approach to sequentially identify differentially 
expressed genes from time-course microarray data under 
different conditions [4]. However, these and many related 
approaches do not incorporate knowledge of gene func¬ 
tion, with respect to the phenotypes of interest, into their 
statistical models. 

Ideally, biomarkers should not only exhibit differential 
gene expressions between normal and disease samples, 
but more importantly, they should also reflect their bio¬ 
logical role in the disease phenotype. Most significance 
analysis methods applied to population (static) or time- 
course microarray data have the limitation that genes are 
analyzed independently and the interactions among them 
are ignored. Clustering methods, such as k-means cluster¬ 
ing [5] and self-organizing maps (SOMs) [6], were intro¬ 
duced to group the genes with similar expression patterns. 
A shortcoming of the clustering methods is that they do 
not allow genes to be shared by multiple clusters. How¬ 
ever, a single gene can be involved in multiple distinct 
biological processes [7]. One solution to this problem is 
to first infer gene regulatory networks [8-12] that appear 
to control or regulate phenotypically relevant biological 
functions, and then to extract the most biologically and 
statistically relevant biomarkers. 

The application of Independent Component Analysis 
(ICA) to microarray data has shown some utility in regu¬ 
latory network inference [10,13]. ICA is a statistically- 
principled linear decomposition method that models the 
observations as a linear combination of some latent (or 
hidden) variables [14]. From the perspective of a gene reg¬ 


ulatory mechanism, any gene expression value can be 
regarded as a combinational effect of some regulatory 
inputs such as transcription factors, cellular functions, or 
responses to experiment conditions [10,12]. As demon¬ 
strated in our previous work [15,16] along with that of 
others [10,12], novel applications of ICA to high-through- 
put data from microarray technology can help reveal 
dominant regulatory mechanisms. 

It is not a trivial task to link the estimated latent variables 
from ICA to real biological functions. To identify biologi¬ 
cally relevant biomarkers for a specific disease, the incor¬ 
poration of prior knowledge is of great importance to 
improving the accuracy of computational methods [17]. 
However, complete prior knowledge is often difficult to 
obtain. Some prior knowledge, such as regulatory motif 
information (promoter responsive element sequence) is 
available and can be incorporated into microarray data 
analysis to assist in regulatory module identification 
[18,19]. Recently, we have developed a new approach 
called motif-directed network component analysis 
(mNCA) to infer transcription regulatory activities (TFAs). 
This approach incorporates a stability analysis procedure 
to overcome the problem of many false positives in motif 
information [20]. Since we can only use known motifs, a 
clear limitation of the mNCA method is that we cannot 
infer any new potential regulatory biomarkers beyond 
prior knowledge from the model. 

In this paper, we propose a novel method, namely knowl¬ 
edge-guided multi-scale ICA, to identify disease-specific 
biomarkers beyond partial prior knowledge. We propose 
that a latent variable estimated by ICA from the entire 
gene expression population represents the joint effect of 
several biological functions. Disease-specific biomarkers 
could be involved in several different biological functions 
by the ICA latent variables or linear regulatory modes. 
Therefore, we first cluster the whole gene population into 
multiple sub-populations in which only a few biological 
processes are involved. We then uncover the knowledge¬ 
relevant regulatory modes in each subpopulation based 
on the partial prior knowledge. Finally, disease-specific 
biomarkers are extracted according to the strength of their 
association with the extracted regulatory modes. A statisti¬ 
cal test is applied to evaluate the significant enrichment of 
transcription factors for the extracted biomarkers based 
on motif information. 

For algorithm validation, we applied our approach to two 
time-course microarray data sets to demonstrate its 
improved performance. The first data set is a yeast cell 
cycle microarray data set with 104 well known cell cycle- 
related genes; the second is a remodeling and spacing fac¬ 
tor 1 (Rsf-1) induced microarray data set from a profiling 
study of ovarian cancer. The experimental results show 
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that our approach can identify biologically meaningful 
disease-specific biomarkers related to ovarian cancer, as 
compared to other gene selection methods with or with¬ 
out prior knowledge. 

Methods 

If we apply ICA directly onto an entire gene expression 
population, the extracted regulatory modes will reflect the 
joint effect of several biological functions, some of which 
are related to the disease under study and some are not. To 
overcome this problem, we developed a divide-and-con- 
quer strategy. We applied a knowledge-guided multi-scale 
ICA approach to extract disease-related regulatory modes 
reliably, and then we identify the biomarkers associated 


with the modes. The overall scheme is illustrated in Fig. 1. 
Firstly, a knowledge gene pool (KGP) is constructed by 
collecting the genes that are known to be relevant to the 
specific disease from available databases and literatures. 
Secondly, the entire gene population is divided into sub¬ 
populations by a clustering method applied to the micro¬ 
array data and, to identify regulatory modes, ICA is 
applied to each sub-population. The most relevant linear 
regulatory mode in each cluster is extracted using the gene 
metadata in the KGP and the associated biomarkers are 
ranked according to their weighted loading factors. 
Finally, motif enrichment analysis is conducted to evalu¬ 
ate the extracted biomarker candidates in terms of the 
enrichment of disease-related transcription factors. 




Figure I 

Flow chart of the proposed method - knowledge-guided multi-scale independent component analysis (ICA) - 
for biomarker identification. 
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Independent component analysis (ICA) 

Consider a gene expression data matrix X = [Xjj], whose 
rows correspond to different microarray samples, and col¬ 
umns correspond to individual genes. ICA decomposition 
model can be mathematically formulated as (assuming 
noiselessness for simplicity): 


X 1 - X X L> 

( 1 ) 

Um X i = X N^N X L' 

( 2 ) 


where Equation (1) describes the linear combination 
model with mixing matrix A, and Equation (2) the 
decomposition model with de-mixing matrix W. S, X and 
U are independent components, mixtures, and estimated 
independent components, respectively. M is the number 
of independent components, N the number of samples 
and L the number of genes. 

In microarray data analysis, an ICA model could be inter¬ 
preted as the expression value of an individual gene i 
under condition; (x,(j)) is the summation of different lin¬ 
ear modes in A at condition ; (a/^O)) weighted by inde¬ 
pendent loading factors s^/^in S[ 8 ], as shown below: 

M 

Xi(j) = ^Sifeafe(;), i = l,...,L;; = l,...,N. (3) 

k=i 

The linear modes in A might reflect distinct regulatory 
mechanisms involved in gene regulation, such as tran¬ 
scription factor (TF) activities. The FastiCA algorithm [21] 
can be utilized to obtain A and S based on the assumption 
that the components are statistically independent and 
have non-normal distributions (typically super-Caus- 
sian). This assumption is biologically plausible as most 
genes are not expected to change dramatically. Only the 
genes involved in distinct regulatory mechanisms will 
change, producing super-Caussian distributions in micro¬ 
array data. 

Several methods have been developed to associate a set of 
genes with a specific linear mode [10,12,22]. These meth¬ 
ods each assume that genes with the highest absolute 
loading values are the significant genes associated with 
linear mode a^j. In this paper, genes are ranked by a mod¬ 
ified criterion based on the same assumption as described 
in the next subsection. 

Knowledge-guided multi-scale ICA 

Since ICA is an unsupervised method, it is difficult to 
determine which linear modes are related to specific bio¬ 
logical functions. To identify the biomarkers relevant to a 
specific biological function, prior knowledge could pro¬ 
vide guidance for any computational method. In this 
approach, we will collect a KGP containing genes strongly 


associated with the disease and use these to guide the ICA 
approach for disease-relevant biomarker identification. 
Notice that the total connection strength of the knowl¬ 
edge genes associated with a disease-relevant linear mode 
would be larger, in principle, than that of irrelevant linear 
modes. Based on this observation, the most knowledge¬ 
relevant linear mode can be determined from the esti¬ 
mated ICA modes and the associated genes can then be 
extracted. 

However, if we apply ICA to the entire molecular profile, 
the estimated linear modes will likely reflect the joint 
effect of several biological functions, even for the most 
knowledge-relevant mode, because many disease-irrele¬ 
vant but differentially expressed genes co-exist in the data. 
Conversely, biomarkers should be involved in several dif¬ 
ferent linear modes in relation to underlying biological 
processes. Therefore, it is reasonable to first separate the 
entire profile into sub-populations. We can then find the 
specific ICA linear modes from different subsets of genes 
rather than from the whole gene population; this 
approach is referred to as the "multi-scale ICA" approach 
in this paper. Since these modes will be associated with 
different parts of the knowledge genes in the KGP, they are 
more suitable for biomarker identification. Clustering 
methods, such as k-means clusterin and SOMs, can be 
used to form the subsets of genes, with the assumption 
that the genes involved in similar biological functions are 
more likely to exhibit similar expression patterns than 
genes involved in different biological functions. 

Our method can be mathematically described as follows. 
Assume a whole gene population G in a microarray data 
X has been clustered into n subsets, Gj, Gj,..., G„. For each 
subset G, [i = 1, ..., n), we apply ICA to find the most 
knowledge-relevant linear mode according to the total 
connection strength of the knowleclge genes in this subset. 
Thus, the index; can be obtained by 

; = argmax(^|s^„|) m = ( 4 ) 

m j. 

where is the loading factor for gene g associated with 
linear mode the subset of knowledge genes in the i* 

cluster, and M; the number of independent components 
in the i* cluster. 

Then each gene g in this subset is assigned a score c, which 
is defined as follows: 

\ K ■ \ 

where te, is a weight to represent the significance of the lin¬ 
ear mode in the i* subset associated with the prior knowl- 
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edge. Here we define Wi as the proportion of all knowledge 
genes in this subset with respect to the entire KGP {K). 
Once the knowledge-relevant linear modes in all subsets 
are determined, each gene will have a score assigned and 
we rank the genes in terms of their scores. The larger the 
score, the more strongly the gene is related to the biologi¬ 
cal process. 

A key issue in this method is how to determine the opti¬ 
mal cluster number when forming the subsets of genes. In 
this paper, we determine the optimal cluster number by a 
cross-validation approach. Specifically, we assume the 
optimal cluster number is in some range, from 1 to an 
upper limit. For each cluster number, the knowledge 
genes are randomly stratified into a training gene set (as 
our partial prior knowledge gene set) and a test gene set by 
a ten-fold cross-validation approach. The method is 
applied with the partial prior knowledge genes to rank the 
whole gene population, and prediction accuracy is tested 
on the test gene set. The above procedure is repeated 10 
times, once for each left out fold, and an average accuracy 
over the ten folds is reported. We select the number with 
the highest average accuracy as the optimal cluster 
number for clustering. The upper limit of cluster numbers 
should be cautiously determined by the number of 
knowledge genes and the number of genes in the full pro¬ 
file. If the number of clusters is too large, it will lose the 
ability to infer novel biomarkers. An extreme case is that 
each individual gene forms a cluster and then we can only 
obtain the correct ranks for known genes. Genes not in the 
KGP will be randomly ranked, which is not informative at 
all for biomarker identification. If the cluster number is 
too small, the estimated linear modes may be incorrect 
due to the presence of many irrelevant genes. In our exper¬ 
iments, we set the upper limit as 10 for the yeast cell cycle 
data set and 15 for the ovarian cancer microarray data set, 
respectively. 

Knowledge gene pool (KGP) 

Each KGP is a collection of those genes that are potentially 
most strongly related to a specific disease. Usually there 
are thousands of genes in microarray data and most of 
them are not relevant to a specific disease even though 
they exhibit changes in gene expression level. The knowl¬ 
edge gene pool is an important asset for data analysis 
since it helps reduce many false positives. However, in 
most cases, little prior knowledge can be obtained, and 
the available knowledge is usually neither complete nor 
sufficiently accurate to fully define the specific disease 
under study. Thus, the KGP is best used as a guide for 
biomarker identification. In our studies, the KGP is prima¬ 
rily constructed from the published biological literature or 
from databases such as Ingenuity Pathway Analysis (IPA; 
Ingenuity Systems: http://www.ingenuity.com ) and the 
TRANSFAG 11.1 Professional Database [23]. 


Evaluation by motif enrichment analysis 

For microarray data analysis, there is often no ground 
truth (i.e., true biomarkers known to be related to a spe¬ 
cific biological process or disease under study) available 
for us to evaluate the performance of a biomarker identi¬ 
fication method. However, we know that gene expression 
is often regulated by transcription factors (TFs), proteins 
that bind to promoter or enhancer sequence elements 
upstream of genes and either activate or inhibit gene 
expression. In this paper, with the motif information pro¬ 
vided, we have designed a statistical test to evaluate the 
enrichment of transcription factors for a gene set identi¬ 
fied. A gene-transcription factor matrix M is generated 
where each element in the matrix, m^, represents how well 
the upstream sequence of a gene g matches the motif that 
a transcription factor/binds to. For human genes, 2 Kbp 
upstream regions from the transcription start sites (TSSs) 
of the genes are extracted from the UCSC genome data¬ 
bases [24]. Match™ [25] is then used to search the tran¬ 
scription factor binding site (TFBS) by its position- 
weighted matrices (PWMs) in a gene's upstream region, 
which outputs the scores of core similarity and matrix 
similarity for each matched motif Since one TF may have 
multiple TFBSs, we use the summation of average scores 
of core similarity and matrix similarity to set the final 
value of m^. 

Given a gene set S extracted by a computational method, 
a statistic to measure the enrichment of a specific tran¬ 
scription factor/is defined as 

= ( 6 ) 

g^s 

To calculate the statistical significance (p-value), we need 
to form a null distribution. The null hypothesis is that the 
gene set is randomly generated from the gene population 
and there is no significant enrichment of the transcription 
factor/. We randomly select gene sets with same size of S 
from the baseline gene population, and repeat B times to 
generate the corresponding null statistic enrichment score 

ef’, for b = I, ..., B. The null hypothesis distribution is 

assumed to be symmetric in this study. The p-value can be 
obtained for each gene set by calculating the probability 
that a null gene set has a statistic more extreme than the 
observed statistic. Mathematically, the p-value can be cal¬ 
culated by: 




number of members in {b\e^r^ >e j ,h=\,...,B] 

Ps = „ • 


( 7 ) 
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Baseline experiments and evaluation method 

To evaluate the performance of our proposed approach, 
EDGE algorithm [3] was first considered as a comparison 
method since it was specially designed to identify statisti¬ 
cally significant genes from time-course microarray data. 
However this comparison is insufficient due to that EDGE 
does not incorporate knowledge genes to provide guid¬ 
ance for biomarker identification. On the other hand, 
given partial prior knowledge genes, traditional super¬ 
vised classification methods are not suitable to predict 
whether a gene is related to prior knowledge because there 
is no true negative gene available. Therefore, we design 
three baseline biomarker identification methods that 
incorporate partial prior knowledge for a fair comparison. 
The first baseline IGA method is designed to evaluate if 
our multi-scale strategy by clustering offers an improved 
performance for biomarker identification. Two correla¬ 
tion methods with or without clustering are then imple¬ 
mented to identify the genes exhibiting similar patterns 
with partial prior genes, compared to the IGA approach 
focusing on regulatory mode identification. Specifically, 
the first method is a baseline IGA method where IGA is 
applied to the entire expression profile and the partial 
prior knowledge is used to find the most knowledge-rele¬ 
vant linear mode by Equation (4). Genes are ranked 
according to their absolute connection strengths associ¬ 
ated with this linear mode. The second method estimates 
the correlation with the partial prior knowledge genes 
without clustering (baseline correlation method-1). 
Genes are then ranked based on their absolute correlation 
coefficients between an individual gene expression profile 
and the average profile of partial prior knowledge genes. 
However, taking the average profile of all knowledge 
genes may reduce the sensitivity of detection, especially 
when the genes in KGP are not similar to each other. To 
overcome this problem, the third baseline method is a 
weighted correlation method based on a clustering 
approach (baseline correlation method-2). Similar to the 
multi-scale IGA method, the entire gene population is 
grouped into several sub-populations and a gene in each 
cluster is assigned a score. The score is the weighted abso¬ 
lute correlation coefficient between an individual gene 
expression profile and the average profile of partial prior 
knowledge genes in this cluster. The weight is then calcu¬ 
lated using Equation (5) and genes are ranked according 
to their scores. 

Given a ranked gene list and knowledge gene set, we can 
use the Receiver Operating Gharacteristic (ROG) curve 
[26] and the area under the curve (AUG) to measure the 
test accuracy for each biomarker identification method. 
ROC curve is a graphical plot of true positive rate (TPR) vs. 
false positive rate (EPR). AUG is an important perform¬ 
ance measure that provides an overall measure of accuracy 
for the test. Given a ranked gene list (gj, g 2 , ..., g„) with a 


total of n genes and the ground truth gene set with k 
genes, tme positive rate and false positive rate, when 
selecting top i genes G, in the list, are calculated as follows: 


Gj-fiG/j 

(8) 

k 

i-\ GjDGij 1 
n-k 

(9) 


Results and discussion 

We applied our knowledge-guided multi-scale IGA 
method to two gene expression profiling studies: (1) a 
yeast cell cycle microarray data set [27] and (2) an Rsf-1- 
induced microarray data set. The yeast cell cycle data set 
consists of the expression of 6178 Open Reading frames 
(OREs) during the cell replication cycle in the budding 
yeast (Saccharomyces cerevisiae). The data set consists of 
77 samples corresponding to various experiment condi¬ 
tions. Approximately 800 genes have been identified as 
cycle-regulated genes; among these 104 genes have been 
well studied [27]. We us The goal of this experiment is to 
identify the cell cycle-regulated linear modes and then 
extract the corresponding genes associated with the cell 
cycle. We used the 104 genes as our training knowledge 
gene set and the remaining 704 genes as an independent 
test set for evaluation. 

The Rsf-1-induced microarray data set was acquired and 
analyzed in our experiment. The dataset was generated 
using Affymetrix Human Genome U133 Plus 2.0 Arrays 
from an expression profiling study of ovarian cancer at the 
Johns Hopkins Medical Institutions. The study was 
designed to identify Rsf-1 regulated genes in ovarian can¬ 
cer; Rsf-1 (also known as HBXAP) is a newly discovered 
gene frequently amplified in ovarian cancer [28]; the pro¬ 
tein participates in chromatin remodeling which is essen¬ 
tial for a variety of cellular functions including 
transcription, DNA replication, and DNA repair. The data 
set is composed of 7 samples with two biological condi¬ 
tions (Rsf-1-induced and not Rsf-1-induced) and four 
time points at 0 hour, 6 hours, 18 hours, and 30 hours. 
We used Affymetrix's Probe Logarithmic Intensity Error 
(PLIER) algorithm with quantile normalization to pre- 
process the original intensity data for gene expression 
measurements [29]. After the preprocessing, we obtained 
expression measurements of 54,675 probe sets for each 
sample. 

The EDGE algorithm was first applied to select statistically 
significant expressed genes from yeast cell cycle data and 
Rsf-1 induced ovarian cancer data, respectively. After rank¬ 
ing all genes in terms of their q-values estimated from 
EDGE, we calculated AUG values for yeast cell cycle- 
related genes and ovarian cancer-related genes, respec- 
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lively (see below). As a result, both AUC values are rela¬ 
tively low (around 0.5), which indicates that the genes 
identified from pure data-driven methods (such as EDGE; 
without prior knowledge guidance) may not show strong 
biological relevance. 

Yeast cell cycle data 

To reduce computational complexity, k-means clustering 
was used to form the subsets of genes for both datasets. 
The number of independent components in the EastICA 
algorithm was set to five for this dataset, since our previ¬ 
ous dimension estimation approach with a stability anal¬ 
ysis procedure [16] showed that five independent 
components are sufficient to describe the gene expression 
data. We first conducted ten-fold cross-validation on the 
well studied 104 cell cycle-related genes. For each fold, the 
optimal cluster number is determined by a nested cross- 
validation procedure on the training gene set, as illus¬ 
trated in Fig. 2. The number of clusters ranges from 1 to 
10. Notice that when the number is 1, no clustering is 
needed and the algorithm reduces to the baseline ICA 
method. Each ten-fold cross-validation is repeated 10- 



times with different randomly chosen stratified sets of 
knowledge genes. Since the k-means clustering method 
generates different results depending on its random ini¬ 
tialization, we repeat the procedure ten times with differ¬ 
ent initializations to obtain more reliable results. The 
results reported here are the average results of the ten dif¬ 
ferent initializations. 

The resulting average AUC value of ten-fold cross-valida¬ 
tion on 104 genes is 0.9206 with standard deviation of 
0.0470. Fig. 3 shows the histogram of determined optimal 
number of clusters during the ten-fold cross-validation 
procedure. From the figure we can see the most frequent 
number of clusters is five. Then we implemented three 
baseline methods for ten-fold cross-validation as compar¬ 
isons. For baseline correlation method-2, we chose the 
optimal cluster number from the multi-scale ICA method 
for a fair comparison. The ROC curves of ten-fold cross 
validation for the two baseline correlation methods, the 
baseline ICA method, and our multi-scale ICA method are 
shown in Fig. 4. The ROC curves show that the multi-scale 
ICA method outperforms the baseline correlation 
method-2, and that the baseline ICA approach is better 
than the baseline correlation method-1. Overall, the pro¬ 
posed multi-scale ICA method significantly outperforms 
all three baseline methods as estimated by the Kol- 
mogorov-Smirnov (K-S) one-sided test (Table 1). 

To further test the generalizability of our method, we con¬ 
ducted ten-fold cross-validation on the 104 genes using a 
subset of samples. The original data set includes 77 sam¬ 
ples synchronized by three independent methods: a fac¬ 
tor arrest, elutriation and arrest of a ede 15 temperature- 
sensitive mutant [27j. We selected 63 samples from all the 


Yeast cell cycle dataset 
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Figure 2 

Procedure of ten-fold cross-validation. The optimal 
number of clusters is determined by a nested ten-fold cross- 
validation on training gene set. 


Figure 3 

Histogram of determined optimal number of clusters 
in ten-fold cross- validation on yeast cell cycle data 
set. 
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Figure 4 

ROC curves of ten-fold cross-validation for four 
biomarker identification methods on training knowl¬ 
edge gene set of yeast cell cycle data set. Solid line rep¬ 
resents the multi-scale ICA method; dash-dotted line 
represents the baseline ICA method; dotted line represents 
the correlation method-1; dash line represents the correla¬ 
tion method-2. 


samples by excluding those samples under elutriation 
condition. The resulting average AUC value is 0.9157 with 
standard deviation of 0.0458. Also the most frequent opti¬ 
mal cluster number is five (with a frequency of 65%), 
which shows a great consistency when compared to the 
result using all the samples. 

All 104 knowledge genes were then used as a training set 
in the algorithm to test 704 cell cycle-related genes for all 
four methods. During the training, we still used tenfold 
cross-validation to determine the optimal number of clus¬ 
ters. Fig. 5 shows the average AUC values and their stand¬ 
ard deviations in ten-fold cross-validation across different 
number of clusters. From the figure we can see that the 
average AUC (standard deviation), starting at 0.892 
(0.0006) for the full gene population, decreases a little at 
two and three clusters. The AUC increases gradually and 
reaches the peak of 0.9274 (0.0071) at five clusters, at 
which it remains constant. So the optimal number of clus- 

Table I: P-values of Kolmogorov-Smirnov test for different 
methods on yeast cell cycle data using ten-fold cross-validation 

Method I Method 2 P-values of K-S test 


Optimal ICA Baseline ICA < le-IO 

Optimal ICA Correlation method I < I e-10 

Optimal ICA Correlation method 2 < le-S 


Yeast cell cycle dataset 



Figure 5 

Average area under the curve (AUC) values using 
ten-fold cross-validation with different numbers of 
clusters on 104 knowledge genes. The knowledge- 
guided multi-scale ICA method is applied to yeast cell cycle 
data set for the identification of cell cycle-related genes. 


ters for multi-scale ICA approach is five. Then an inde¬ 
pendent evaluation was performed on the test gene set 
and the ROC curves for these four methods was calculated 
when the cluster number is five (Fig. 6). The ICA-based 
methods significantly outperform the baseline correlation 
methods, and the multi-scale ICA is the best method 
when compared with the three baseline methods (Table 
2 ). 



Figure 6 

ROC curves of four biomarker identification meth¬ 
ods on yeast cell cycle data set with an independent 
test gene set. 
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Table 2: P-values of kolmogorov-Smirnov test for different 
methods on yeast cell cycle data using an independent test gene 
set 


Method 1 

Method 2 

P-value of K-S test 

Optimal ICA 

Baseline ICA 

< le-IO 

Optimal ICA 

Correlation method 1 

< le-IO 

Optimal ICA 

Correlation method 2 

< le-IO 


We examined in detail the extracted knowledge-relevant 
linear modes and the biological functions of their associ¬ 


ated cell cycle-regulated genes. Fig. 7 shows five knowl¬ 
edge-relevant linear modes and their weights as identified 
when the number of clusters is set at the optimum 
number of five (Fig. 5). The top three linear modes have 
much higher weights than the lower two modes and their 
estimated TFAs clearly show periodic patterns related to 
cell cycle. We examined the biological functions of these 
well-known cell cycle-regulated genes associated with 
these three linear modes. The majorities of genes in linear 
mode L3 are associated with the M/Gl boundary or are 
known transcriptional targets of STE12/MCM1. Most of 


LI, = 0.56 



-0.5 




Figure 7 

Five cell cycle-related linear modes in the proposed multi-scale ICA approach on yeast cell cycle data set. The 

weight is also listed in the figure for each linear mode. 
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the genes in linear mode LI are SCB/MCB regulated in late 
G1 and S phase. Finally, many genes in linear mode L2 are 
in S/G2 and G2/M phases. In summary, we can see that 
the linear modes L3, LI, and L2 correspond to different 
biological functions in cell cycle process. 

The top 10 genes selected by multi-scale IGA method are 
listed in Table 3. Among them, four genes (GLN2, MGDl, 
POL30 and RNRl) are in the known training gene set. All 
other genes (GSI2, PRY2, YOXl, TOS4, AXL2 and GRHl) 
are the genes related to cell cycle beyond our training gene 
set, i.e., in the test gene set. The results show that our 
method is effective at finding novel biomarkers beyond 
knowledge, which is clearly an important feature of the 
proposed approach for novel biomarker identification 
beyond prior knowledge. In most of cases, especially for 
human disease, knowledge genes are limited and we need 
to infer the new ones from partial knowledge for biomar¬ 
ker discovery. 

Rsf-1-induced gene expression data 

Knowledge gene pool (KGP) 

To construct the KGP, we started with the known gene Rsf- 
1 and its related genes, NF-kappa B (NFKBl) and 
SMARCA5 (also known as hSNF2FI) as reported in [30], 
to search the databases. We used Ingenuity Pathway Anal¬ 
ysis (IPA) to extract 95 genes that are thought to be 
directly related to NFKBl and SMARGA5. Note that there 
is no network related to Rsf-1 in the current IPA database. 
We also included 43 genes from TRANSFAC 11.1 Profes¬ 
sional Database [23], whose protein products are tran¬ 
scription factors biologically relevant to ovarian cancer as 
reported in literature. Hence, our KGP consists of 141 dis¬ 
tinct Affymetrix probe set identifiers that represent the 
expression values for the 138 genes. 

Multi-scale ICA results 

We used 'tanh' nonlinearity in the FastIGA algorithm: 
other parameters were set at their default values. The 
number of the independent components is set to a maxi¬ 


mum value of 6 due to the limitation of sample size. Ten¬ 
fold cross-validation was conducted on our partial prior 
knowledge genes, where the optimal cluster number was 
determined by a nested cross-validation approached for 
each fold as shown in Fig. 2. The number of clusters was 
set from 1 to 15. We also repeated 10 times for ten-fold 
cross-validation and k-means clustering in order to gener¬ 
ate more reliable results. The resulting average AUG is 
0.7203 with standard deviation of 0.0804. Fig. 8 shows 
the histogram of determined optimal cluster number in 
the ten-fold cross-validation procedure and we can see 
that the most frequent cluster number is 4. We compared 
the ROG curves for the two baseline correlation methods, 
the baseline IGA and the multi-scale ICA for ten-fold 
cross-validation (Fig. 9). The results in Table 4 show that 
multi-scale ICA method performs significantly better than 
baseline ICA method and baseline correlation method-1 
with p-value < le-10, while performing marginally better 
than baseline correlation method-2 (p-value = 0.0037). 
Since baseline correlation method-2 also calculates clus¬ 
tered average profiles of the prior knowledge genes, this 
result indicates that the multi-scale approach by clustering 
is an effective strategy to improve the performance for 
ovarian cancer-related biomarker identification. On the 
other hand, a major weakness in baseline correlation 
method-1 lies in that the average profile of all prior 
knowledge is used when their expression profdes are not 
similar to each other. 

Evaluation by motif analysis 

All knowledge genes were used as the training set in the 
algorithm to rank the whole gene population for all four 
methods. During the training, we still used ten-fold cross- 
validation to determine the optimal number of clusters in 
multi-scale ICA method. Fig. 10 shows the average AUG 
values and their standard deviations obtained with differ¬ 
ent numbers of clusters for the ten-fold cross-validation; 
the average AUG (standard deviation), starting at 0.6146 
(0.0004) for the whole gene population, increases to 
0.7329 (0.0253) at two clusters and reaches the maximum 


Table 3: Top 10 genes selected by the proposed multi-scale ICA method on yeast cell cycle data 


Rank 

ORF 

Name 

Short Description 

1 

YPL256C 

CLN2 

CycLiN; Gl cyclin invoived in reguiation of the ceii cycie 

2 

YOL007C 

CSI2 

Chitin Synthesis Involved; protein of unknown function 

3 

YKR0I3W 

PRY2 

Pathogen Reiated in Yeast; protein of unknown function 

4 

YDL003W 

MCDI 

Mitotic Chromosome Determinant; expression is ceii cycle regulated and peaks in S phase 

5 

YML027W 

YOXl 

Homeodomain-containing transcriptional repressor 

6 

YBR088C 

POL30 

POLymerase; proliferating cell nuclear antigen (PCNA) 

7 

YLRI83C 

TOS4 

Target of SBF; promoters of some genes involved in pheromone response and cell cycle; 

8 

YILI40W 

AXL2 

AXiaL budding pattern; glycosylated by Pmt4p; potential Cdc28p substrate 

9 

YGRI89C 

CRHI 

Congo Red Hypersensitive; cell wall protein; putative chitin transglycosidase 

10 

YER070W 

RNRl 

RiboNucleotide Reductase; the RNR complex catalyzes the rate-limiting step in dNTP synthesis and is 
regulated by DNA replication and DNA damage checkpoint pathways via localization of the small subunits 
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Ovarian cancer dataset 



number of clusters 


Figure 8 

Histogram of determined optimal number of clusters 
in ten-fold cross- validation on ovarian cancer data 
set. 


value of 0.7343 (0.0210) at four clusters, and remains 
almost constant thereafter. Therefore, the optimal 
number of cluster for the multi-scale ICA approach was 
selected as four. Specifically, we examined estimated lin¬ 
ear modes from ICA methods. Fig. 11 shows the estimated 
knowledge-related TFAs using baseline ICA method and 
Fig. 12 shows the estimated four knowledge-related TFAs 
and their weights using our multi-scale ICA method. We 
observe that one of the TFA patterns in Fig. 12 (L3) is sim¬ 
ilar with that in Fig. 11, which indicates that multi-scale 
ICA method can estimate more TFAs for knowledge- 
related genes than baseline ICA method. Four different 
linear modes and their weights in Fig. 12 also indicate that 
the expression patterns of the genes in KGP are not similar 
to each other, which seems to be the major reason behind 
that baseline correlation method-1 (using the average pro¬ 
file of all prior knowledge) underperforms other meth¬ 
ods. 

For the final ranked gene lists, we performed motif enrich¬ 
ment analysis to evaluate the performance of each of the 
four different methods for biomarker identification. Spe¬ 
cifically, among 43 ovarian cancer-related TFs extracted 
fromTRANSFAC 11.1 Professional Database [23], 14 TFs 
have their PWMs available and we generated the gene-TF 
matrix M for them. For each TF, a PWM was chosen from 


Table 4: P-values of Kolmogorov-Smirnov test for different 
methods on Rsf-1-induced ovarian cancer microarray data 


Method 1 

Method 2 

p-value of the K-S test 

Optimal ICA 

Baseline ICA 

< le-IO 

Optimal ICA 

Correlation method 1 

< le-IO 

Optimal ICA 

Correlation method 2 

0.0037 



Figure 9 

ROC curves of ten-fold cross-validation for four 
biomarker identification methods on knowledge 
gene set of ovarian cancer data set. Solid line repre¬ 
sents the multi-scale ICA method; dash-dotted line repre¬ 
sents the baseline ICA method; dotted line represents the 
correlation method-1; dash line represents the correlation 
method-2. 


the vertebrate non-redundant profiles. Table 5 lists their 
TRANSFAC PWM entry IDs and the corresponding TF 
descriptions. To increase the statistical power, we con¬ 
ducted multiple tests by selecting different gene sets with 
different sizes for different gene selection methods. The 
number of genes in each gene set ranges from 100 to 
1,000 and the average p-values for 14 TFs are reported. 


Ovarian cancer dataset 



Figure 10 

Average AUC values using ten-fold cross-validation 
across different numbers of clusters. The knowledge- 
guided multi-scale ICA method is applied to Rsf-1-induced 
ovarian cancer microarray data set for the identification of 
disease-specific biomarkers. 
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Figure I I 

Estimated knowledge-related TFAs using baseline 
ICA method. X-axis represents the time and Y-axis repre¬ 
sents the estimated TFAs. 


Fig. 13 shows the average p-values of TFs enrichment for 
different gene sets selected hy different methods. Both ICA 
methods outperform the baseline correlation methods in 


terms of finding more enriched ovarian cancer-related TFs 
binding sites. Moreover, our multi-scale ICA method is 
slightly better than baseline ICA method for motif enrich¬ 
ment. It is worth noting that although both multi-scale 
ICA and baseline ICA methods can extract ovarian cancer- 
related biomarkers with significant motif enrichment, 
multi-scale ICA method can help reveal more biomarkers 
related to ovarian cancer. For this experiment, it is also 
expected to have similar TF enrichment from both meth¬ 
ods, since one common linear mode is revealed by both 
methods (i.e., the mode in Fig. 11 is very similar with the 
L3 mode in Fig. 12). From the pattern of this common 
mode, we postulate that this is a major mode related to 
RSF-1-induced ovarian cancer. Therefore, the genes 
extracted from this mode will show a similar significance 
level inTF enrichment (as shown in Fig. 13). However, the 
multi-level ICA approach can extract other linear modes 
related to ovarian cancer (see Fig. 12). Apparently, the 
biomarkers related to these other modes cannot be iden¬ 
tified with the baseline ICA approach. This can be sup¬ 
ported by the ROC curves in Fig. 9, showing an improved 
performance of using multi-scale ICA approach compared 
to that of using baseline ICA approach. 



LI, wi =0.1765 




Estimated four knowledge-related TFAs using the proposed muti-scale ICA method. X-axis represents the time 
and Y-axis represents the estimated TFAs. 
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Table 5: Ovarian cancer-related TFs and their TRANSFAC entry IDs & descriptions 


Index TF Name PWM Consensus Binding Site Factor Description 

Access No. 


1 

AP-2 

MOO 189 

MKCCCSCNGGCG 

2 

AP-2alpha 

M00469 

GCCNNNRGS 

3 

AP-2alphaA 

MO 1045 

ANNGCCTNAGGSNNT 

4 

AP-2gamma 

M00470 

GCCYNNGGS 

5 

AP-2rep 

M00933 

CCCCGCCCCN 

6 

BRCAI 

MO 1082 

KTNNGTTG 

7 

E2F 

MOOS 16 

TTTSGCGCGMNR 

8 

Elk-1 

M00007 

NAAACMGGAAGTNCVH 

9 

NF-kappaB 

M00774 

N N N N KGG RAANTCCCN 

10 

Sp 1 

M00933 

CCCCGCCCCN 

1 1 

TGIF 

M004I8 

AGCTGTCANNA 

12 

c-Rel 

M00053 

SGGRNTTTCC 

13 

P53 

M00272 

NGRCWTGYCY 

14 

ER 

M00I9I 

NNARGNCANNNTGACCYNN 


Activator protein 2 

Activating enhancer binding protein 2 alpha 
Activating protein 2, AP-2A, Ker-1 
Activator protein 2gamma, ERF-1 
Specificity protein I, stimulating protein I 
Breast cancer type I susceptibility protein 

EIIF protein, activator of myc, important for pi07 promoter activity 

Elk I, member of ETS oncogene family 

Nuclear factor kappa B, pSO 

Specificity protein I, stimulating protein I 

S'-TG-3' interacting factor, TG-interacting factor, TGFB-induced factor 
Nuclear factor kappa B c-Rel, p68 
Tumor protein p53, TRPS3 
Estrogen receptor 


Discussion with biological interpretation 

To enable a more detailed analysis, the top 10 genes 
extracted by optimal multi-scale ICA method are listed in 
Table 6 and the putative TFs in their promoter regions are 
shown in Fig. 14. Since none of the genes are in the KGP, 
they were entered into an Ingenuity Pathways Analysis 
(IPA) where we found that all of these genes can be incor¬ 


porated into a single hypothetical network (Fig. 15). The 
major functions of this network are involved in gene 
expression, cancer development, and cellular motility. 
Five genes, FOSE, FOS, EGRl, 1L8 and CDK2, are in the 
cancer module with p-values ranging from 1.84E-7 to 
6.5E-3. EOSB and FOS belong to the Fos family that het- 
ero-dimerizes with Jun proteins to form the AP-1 tran- 


Rsf-1 induced ovarian cancer dataset 



number of top genes 

Multi-scale ICA ♦ Baseline ICA 

Baseline correlation method-1 x Baseline correlation method-2 


Figure 13 

Average p-value of TF enrichment for different gene sets associated with different methods on Rsf-1-induced 
ovarian cancer microarray data set. 
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Table 6: Top 10 genes selected by the proposed multi-scale ICA on Rsf-1-induced microarray data 


Rank 

Probe Set ID 

Gene Symbol 

Gene Full Name 

1 

202768_at 

FOSB 

FBJ murine osteosarcoma viral oncogene homolog B 

2 

209l89_at 

FOS 

v-fos FBJ murine osteosarcoma viral oncogene homolog 

3 

205476_at 

CCL20 

chemokine (C-C motif) ligand 20 

4 

2l2009_s_at 

STIPI 

stress-induced-phosphoprotein 1 

5 

20979S_at 

CD69 

CD69 molecule 

6 

21IS06_s_at 

IL8 

interleukin 8 

7 

l5S79IO_at 

HSP90ABI 

heat shock protein 90 kDa alpha (cytosolic), class B member 1 

8 

227404_s_at 

EGRl 

Early growth response 1 

9 

21I804_s_at 

CDK2 

cyclin-dependent kinase 2 

10 

20862l_s_at 

VIL2 

villin 2 


scription factor complex [31]. AP-1 transcription factors 
control rapid responses of mammalian cells to stimuli 
that are associated with proliferation, differentiation and 
transformation [32]. lL-8 is a member of the C-X-C family 
of chemokines, and overexpression of lL-8 is observed in 
subsets of human ovarian cancer cells [33]. Previous stud¬ 
ies have shown that the expression of interleukin-8 (IL-8) 
is directly correlated with the progression of human ovar¬ 
ian carcinomas implanted into the peritoneal cavity of 
nude mice [34]. The early growth response 1 (EGRl) is a 
transcription factor that acts as both tumor suppressor 
and tumor promoter depending on the cellular context. In 
the experiments of multiple pituitary and ovarian defects 
in Krox-24 (NGFI-A, Egr-1)-targeted mice, EGRl was 
implicated as a novel key regulator of anterior pituitary 
physiology and that it may play important roles in specific 


cell lineages [35]. CDK2 is known to be involved in cell 
cycle regulation and the overexpression of GDK2 is asso¬ 
ciated with malignancy in ovarian tumors [36]. 

Conclusion 

Biomarker identification is an important goal in many 
microarray data analyses. We propose a novel method, 
knowledge-guided multi-scale ICA, to find relevant 
biomarkers associated with specific biological functions. 
We aimed to infer knowledge-relevant regulatory signals 
and then identify corresponding biomarkers through a 
multi-scale strategy. A knowledge gene pool is constmeted 
from multiple knowledge sources to help identify disease- 
specific gene clusters. By applying ICA to multi-scale gene 
clusters, an examination of the revealed regulatory modes 
can uncover knowledge of the underlying biological regu- 
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Figure 14 

TFs and their locations in 2 Kbp promoter region for top 10 genes selected by our approach. The promoter 
region is represented from -2,000 bp to 0 from TSS and each block in the figure represents a 100 bp region. 
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Figure IS 

The network obtained from IPA with all of top 10 genes in Table 6. Five genes, FOSE, FOS, EGRI, IL8 and CDK2, are 
highly related to cancer module. 


latory mechanisms. In addition, we have designed a statis¬ 
tical test procedure to measure the transcription factor 
enrichment of a selected gene set based on motif informa¬ 
tion. The approach was successfully applied to two gene 
expression profde data sets to identify biomarkers: yeast 
cell cycle microarray data and Rsf-1-induced microarray 
data. The experimental results show that our method can 


extract apparently biologically meaningful and condition- 
related biomarkers. The performance of the proposed 
method significantly outperforms several baseline meth¬ 
ods for biomarker identification. More importantly, the 
proposed method has notable potential to discover novel 
biomarkers beyond any partial prior knowledge. 
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Abstract 

One-third of all estrogen receptor (ER)-positive breast tnmors 
treated with endocrine therapy fail to respond, and the 
remainder is likely to relapse in the futnre. Almost all data on 
endocrine resistance has been obtained in models of invasive 
dnctal carcinoma (IDC). However, invasive lobnlar carcinomas 
(ILC) comprise up to 15% of newly diagnosed invasive breast 
cancers each year and, whereas the incidence of IDC has 
remained relatively constant during the last 20 years, the 
prevalence of ILC continues to increase among postmeno¬ 
pausal women. We report a new model of Tamoxifen (TAM)- 
resistant invasive lobular breast carcinoma cells that provides 
novel insights into the molecular mechanisms of endocrine 
resistance. S1JM44 cells express ER and are sensitive to the 
growth inhibitory effects of antiestrogens. Selection for 
resistance to 4-hydroxytamoxifen led to the development of 
the SUM44/LCCTam cell line, which exhibits decreased 
expression of ERot and increased expression of the estrogen- 
related receptor 7 (ERRy). Knockdown of ERRy in SUM44/ 
LCCTam cells by siRNA restores TAM sensitivity, and over¬ 
expression of ERRy blocks the growth-inhibitory effects of 
TAM in S1JM44 and MDA-ME-134 VI lobular breast cancer 
cells. ERRy-driven transcription is also increased in SUM44/ 
LCCTam, and inhibition of activator protein 1 (API) can 
restore or enhance TAM sensitivity. These data support a role 
for ERRy/APl signaling in the development of TAM resistance 
and suggest that expression of ERRy may be a marker of poor 
TAM response. [Cancer Res 2008;68(21):8908-17] 

Introduction 

Breast cancer is the second-most common cause of cancer- 
related death in women (1). One of the challenges in treating breast 
cancer is addressing the biological heterogeneity evident in the 
existence of several histologic and molecular subtypes. Two of the 
major histologic breast cancer classifications are invasive ductal 
carcinoma (IDC) and invasive lobular carcinoma (ILC). Currently, 
ILCs comprise up to 15% of invasive breast cancer diagnoses 
annually (2). Although the incidence of IDC has remained relatively 


Note: Supplementary data for this article are available at Cancer Research Online 
(http://cancerres.aacrjournaIs.org/). 
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constant during the last 20 years, a significant increase in ILC 
diagnosis is evident among postmenopausal women in Western 
Europe and the United States (reviewed in ref 3). Although the 
increased use of estrogen plus progestin hormone replacement 
therapy for relief of perimenopausal and postmenopausal symp¬ 
toms during this same time period may have contributed to the 
increase in ILC incidence (3), the precise mechanism(s) remains 
uncertain. 

The clinical and pathologic features of lobular tumors are 
unique. ILC typically invades in a linear pattern, creating a longer, 
thinner mass, which is more difficult to detect by mammography, 
ultrasound, or breast self-exam (3). ILCs have a greater tendency to 
be bilateral, and women with this type of breast cancer are 
frequently older and have larger tumors at the time of their 
diagnosis (3). A higher incidence of ILC has been reported among 
women who initially present to the clinic with metastatic breast 
cancer (4). Although recent clinical studies imply that ILC is less 
responsive to neoadjuvant cytotoxic chemotherapy as a precursor 
to breast-conserving surgery (5, 6), there are conflicting reports as 
to whether patients diagnosed with ILC have a poorer, equivalent, 
or improved prognosis and overall survival when compared with 
IDC (reviewed in ref 3). 

Breast cancer patients whose tumors express estrogen receptor 
(ER) a (ERct) may be offered endocrine or antiestrogen therapy in 
addition to or in place of conventional chemotherapies. Currently, 
the most widely used antiestrogen is the triphenylethylene 
Tamoxifen (TAM), which functions as a partial antagonist by 
competing with estrogen for binding to the ER. TAM is known to 
induce a statistically significant improvement in the overall 
survival rate from breast cancer (7), and ~ 70% of all ER-positive 
(ER+)/progesterone receptor (PR)-positive (PR+) breast cancers 
will respond to TAM. When compared with IDC, a significantly 
greater percentage of ILC tumors are ER+/PR+ (discussed in ref 3), 
suggesting that women diagnosed with this tumor subtype should 
be ideal candidates for endocrine therapy. However, study results 
differ as to whether ILC patients experience a better or worse risk 
of mortality than IDC patients after antiestrogen treatment (8, 9). 

Regardless of tumor subtype, the development of endocrine 
resistance is a pervasive clinical problem (10-12). One-third of ER+/ 
PR+ breast tumors treated with TAM do not respond to initial 
treatment, and the remaining 70% are stiU at risk to relapse in the 
future. A number of mechanisms have been proposed to control 
antiestrogen resistance in ER+ breast cancer (13), but many details of 
these mechanisms continue to be unclear. Studying endocrine 
resistance specifically in ILC has not been possible because of the 
lack of appropriate models; the most common models of resistance 
(notably MCF-7 cells) are derived from ductal adenocarcinomas (14). 
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Endocrine Resistance in Lobular Carcinoma 


Given the unique clinical and molecular features of lobular 
tumors, and the suggestion that ILC tumors may respond less well to 
endocrine therapy, we have developed an ILC-specific cell culture 
model of endocrine resistance. The SUM44 breast cancer cell line 
was isolated from an ILC metastasis (15), is ER+/PR+, and displays 
other common features of ILC such as the loss of E-cadherin (16). We 
show that SUM44 cells contain functional ER and are sensitive to 
growth inhibition by antiestrogens. Selection of SUM44 cells against 
4-hydroxytamoxifen (4HT) led to the establishment of the SUM44/ 
LCCTam cell line, which is stably resistant to TAM. We then 
identified candidate genes associated with the endocrine resistant 
phenotype in SUM44/LCCTam cells and found changes in the 
expression of ERa and the estrogen-related receptor y (ERRy). Our 
mechanistic studies show that knockdown of ERRy in the resistant 
cell line, and overexpression of ERRy in endocrine-responsive 
lobular breast cancer cells, modulates TAM sensitivity. Einally, we 
show that ERRy-driven transcription is increased in the resistant 
SUM44/LCCTam cell line, and inhibition of activator protein 1 (API) 
can restore or enhance TAM sensitivity in this model system. 

Materials and Methods 

Cell culture and reagents. All cells were shown to be free of 
Mycoplasma spp. contamination and maintained in a humidified incubator 
at 37°C in an atmosphere containing 95% air/5% C02. Routine tissue culture 
reagents (culture medium and additives, PBS, trypsin, etc.) were purchased 
from Invitrogen. 

SUM44 cells were routinely cultured in serum-free medium plus insulin 
and hydrocortisone (SFIH) as described previously (15). LCCTam cells were 
maintained in SFIH containing 500 nmol/L 4HT (Sigma). LCCTam cells 
were cultured in SFIH in the absence of 4HT for 1 wk before all experiments. 
When SUM44 and LCCTam were passaged, cells were seeded in SFIH 
containing 2% fetal bovine serum (FBS) for the first 24 h to neutralize 
trypsin and promote cell attachment. MCF-7 cells were originally obtained 
from Dr. Marvin Rich (Karmanos Cancer Center, Detroit, MI), and MDA- 
MB-134 VI breast cancer cells were purchased from American Type Culture 
Collection; both were maintained in improved minimal essential medium 
with phenol red supplemented with 5% FBS. 

17[i-Estradiol (estradiol, E2) was purchased from Sigma; Fulvestrant (ICI 
182,780; Fulv) and the c-JUN peptide inhibitor were purchased from Tocris 
Bioscience. The 3xERE-tk-luc promoter-reporter plasmid was kindly pro¬ 
vided by Dr. Malcolm G. Parker (Imperial College, London, United Kingdom; 
ref 17), 3xSFlRE-luciferase was a gift from Dr. Jean-Marc Vanacker (Institut 
de Genomique Fonctionnelle de Lyon, Universite de Lyon, Lyon, France; 
ref 18), and 3xAPl-luciferase was generously provided by Dr. Richard Pestell 
(Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA). 
The plasmid encoding wild-type murine ERRy bearing an NH2-terminal 
hemagglutinin (HA) tag (pSG5-HA-ERR3) was a gift from Dr. Michael 
Stallcup (Keck School of Medicine, University of Southern California, Los 
Angeles, CA; ref 19). Small inhibitory RNA (siRNA) oligonucleotide duplexes 
directed against ERRy (siGENOME SMARTpool), nonsilencing control 
oligonucleotides, and the DharmaFECT 1 reagent were purchased from 
Dharmacon. The FuGene 6 transfection reagent was purchased from Roche. 

Luciferase promoter-reporter assays. Cells were seeded in SFIH at a 
density of 9 x 10"* cells per well in 12-well plastic tissue culture dishes for 24 
to 48 h before transfection with 0.6 pg luciferase promoter-reporter 
construct and 0.2 pg phRL-SV40 Renilla internal control (Promega). The 
following day, transfected cells were refed with SFIH, or SFIH containing 
10 nmol/L E2,1,000 nmol/L 4HT, 100 nmol/L Fulv, 20 pmol/L c-JUN peptide 
inhibitor, or ethanol vehicle as indicated in each figure for a further 24 h 
before lysis and measurement of luciferase activity by using the Dual 
Luciferase Assay kit (Promega) as described previously (20). Luminescence 
was quantified using a Lumat LB 9501 luminometer (EGScG Berthold). 

Proliferation assays. Cells were seeded in SFIH at a density of 2 to 3 x lO-* 
per well in 24-well plastic tissue culture dishes 1 d before the addition of the 


indicated concentrations of drug or ethanol vehicle. Cells were cultured for 
6 d with two medium changes before being trypsinized, resuspended in PBS, 
and counted using a Z1 Single Coulter Counter (Beckman/Coulter). At least 
three independent assays were performed in triplicate or quadruplicate, and 
the data were normalized to vehicle-treated cells. 

BrdUrd ELISAs. Cells were seeded in SFIH at a density of 1 x 10"^ cells 
per well in 96-weIl plastic tissue culture dishes 1 d before the addition 
of drug or ethanol vehicle as indicated. Cells were then cultured for — 54 h 
before the addition of BrdUrd (final concentration 10 pmol/L) for an 
additional 18 h (total incubation in drug, 72 h) before performing the Cell 
Proliferation ELISA, BrdUrd (colorimetric) assay as directed by the 
manufacturer (Roche). At least three independent assays were performed 
with five replicate wells per treatment group, and data were normalized to 
vehicle-treated cells. 

BrdUrd immunofluorescence assays. These assays were performed as 
described above (drug treatment and BrdUrd addition) and by Riggins and 
colleagues (cell seeding and staining procedures; ref. 21) with the following 
modifications: ERRy expression was detected using the HA.11 monoclonal 
antibody from Covance (1:500) followed by AlexaFluor594-conjugated goat 
anti-mouse secondary antibody (Invitrogen; 1:500), and BrdUrd incorpora¬ 
tion was detected using the AlexaFluor488-conjugated anti-BrdUrd 
antibody (1:10; BD Biosciences). Cells were visualized on a Nikon E600 
epifluorescence microscope at x20 magnification. 

Cell cycle analysis. Cells were seeded in SFIH at a density of 5 x 10"^ 
cells per well in 6-well plastic tissue culture dishes 1 d before the addition of 
1,000 nmol/L 4HT or ethanol vehicle. Cells were then cultured for 72 h before 
harvesting and cell cycle analysis by the Vindelov method (22). 

Derivation of SUM44/LCCTam cells. A TAM-resistant SUM44 variant 
was established according to previously published procedures (23). 
Subconfluent T-25 cm^ tissue culture flasks of SUM44 cells were selected 
against increasing concentrations of 4HT, beginning with 1 nmol/L. After 
3 passages of the cells at each dose, the drug concentration was increased 
(l-^-5-^-lO—>50—>100^500 nmol/L), terminating at a concentration of 
500 nmol/L 4HT. Cells proliferating in 500 nmol/L 4HT were designated 
SUM44/LCCTam (hereafter abbreviated as LCCTam). LCCTam cells were 
cultured in SFIH in the absence of 4HT for 1 wk before all experiments. 

Comparative genomic hybridization. Normal control DNA was 
prepared from peripheral blood lymphocytes of a normal donor and 
test DNA was extracted from the cultured cell lines (SUM44 and the 
TAM-resistant LCCTam variant) using standard protocols, and compar¬ 
ative genomic hybridization (CGH) was performed as previously 
described (24). Gray scale images from at least 10 metaphases from 
each hybridization were acquired with a cooled charge-coupled device 
CCD camera (CH250; Photometries) connected to a Leica DMRBE 
microscope equipped with fluorochrome specific optical filters TRl, TR2, 
TR3 (Chroma Technology). Quantitative evaluation of the hybridization 
was done using commercially available software (Applied Imaging). 
Average ratio profiles were calculated as the mean value of at least eight 
ratio images to identify chromosomal copy number changes in all cases 
(see Supplementary Fig. SI). 

RNA isolation, gene expression microarray preprocessing, and data 
analysis. Total RNA was extracted from subconfluent T-25 cm^ tissue 
culture flasks of SUM44 and LCCTam cells, then processed and arrayed as 
described by Gomez and colleagues (25). Microarray data quality was then 
assessed using several tools, including those recommended by Affymetrix 
and a series of additional QC measures under development in our 
laboratory (26). The Robust Multiple-Array Average method was used to 
preprocess the raw gene expression data, as implemented in the 
Bioconductor project.”^ We then isolated a reduced dimension data set that 
included genes that exhibit >2 fold change {P < 0.05) and genes with 
intensity >log2 (10) in both SUM44 and SUM44/LCCTam groups. Data 
visualization before and after dimensionality reduction was facilitated by 
multidimensional scaling as estimated using Principal Component Analysis 
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(PCA) and Discriminant Component Analysis (27), to ensure that the global 
structure of the data were not altered by dimensionality reduction 
procedures (see Supplementary Fig. S2). Expression data are available 
through the Gene Expression Omnibus database, accession GSE12708. 

Real-time qPCR. Total RNA from independent cultures (not RNA from 
cultures used for microarray analysis) was isolated, cleaned, quantified, and 
reverse-transcribed as described in (25). qPCR reactions for each cDNA 
sample and a standard curve were performed using TaqMan Universal PCR 
Master Mix and the following TaqMan Gene Expression Assay primers 
(Applied Biosystems): ESRl, Hs00174860_ml; ESRRG, Hs00155006_ml; and 
the housekeeping gene RPLPO (Hs99999902_ml) as in Gomez and 
colleagues (25). Expression data for each gene was estimated relative to 
the housekeeping control, and these data were used to calculate the ratio of 
expression relative to that in the parental SUM44 cell line. 

Cell lysis and Western blot analysis. Subconfluent monolayers of cells 
were harvested, lysed, and analyzed by Western blot as in Bouker and 
colleagues (28). Primary antibodies for ERRy (1:1,000), ERRa (1:500), and ERR(i 
(1:500) were purchased from GenWay. Antibodies for ERa (1:500) and ER(i 
(1:1,000) were purchased from NovoCastra and Affinity Bioreagents, 
respectively. Antibodies for FASN (1:500) and HMGCS2 (1:2,000) were 
purchased from Abeam. To confirm equal loading, membranes were reprobed 
using a (5-actin monoclonal antibody (1:5,000) purchased from Sigma, or a 
glyceraldehyde-3-phosphate dehydrogenase (GAPDH) goat polyclonal anti¬ 
body (1:5,000) purchased from Santa Cruz Biotechnology. Secondary anti¬ 
bodies conjugated to horseradish peroxidase were purchased from GE 
Healthcare and Santa Cruz Biotechnology. Densitometry was performed using 
NIH ImageJ software^ and images were compiled using Adobe Photoshop CS2. 

ERRy siRNA. LCCTam cells were seeded in 96-well plastic tissue culture 
dishes in SFIH at 1 x 10"^ cells per well 1 d before transfection with 
100 nmol/L ERRy (siERRy) or nonsilencing control siRNA oligonucleotides 
(siC) using DharmaFECTl (Dharmacon) according to manufacturers 
specifications. The next day, cells were treated with 1,000 nmol/L 4HT or 
ethanol vehicle before addition of BrdUrd for an additional 18 h (total 
incubation in drug, 48 h). Cell Proliferation ELISA, BrdUrd (colorimetric) 
assays were performed as described above. In parallel, cells were seeded in 12- 
well dishes at a density of 9 x 10"^ cells per well, transfected with 100 nmol/L 
siC or siERRy, and cells were lysed on the same day that BrdUrd ELISAs were 
performed (total transfection time, 72 h) for Western blot analysis. 

Statistics. All statistical calculations were performed using SigmaStat 
version 3.0 (Systat). Luciferase promoter-reporter, ceU proliferation, BrdUrd, 
real-time reverse transcription-PCR (RT-PCR), and microarray data from 
in vitro studies were compared using either Student’s t test or one-way 
ANOVA followed \yy post hoc t test, as appropriate, and indicated in the text 
and figure legends. Statistical significance is defined at >95% confidence 
level, or a P value of <0.05. 

Results 

SUM44 cells have functional ER and are sensitive to growth 
inhibition hy 4HT. The SUM44 breast cancer cell line was derived 
from an ILC and a high percentage of ILC tumors are ER+ (29). 
Although this cell line is also ER+ (15), ER functional status is 
unknown and SUM44 responsiveness to estrogens and antiestro¬ 
gens has not previously been determined. Therefore, SUM44 cells 
were transfected with the 3xERE-tk-luc reporter construct and 
stimulated with estrogen, antiestrogen, or ethanol control (Fig. lA). 
Estrogen (E2) modestly but significantly induces, whereas 4HT 
significantly decreases, ERE-luciferase activity (P < 0.001). We also 
observed that the steroidal antiestrogen Fulv decreases ERE- 
luciferase activity, and that both 4HT and Fulv block the E2- 
induced stimulation of ERE-luciferase activity (P < 0.001). These 
data suggest that the SUM44 ER responds appropriately to 
estrogenic and antiestrogenic stimuli. 


^ http://rsb.info.nih.gov/ij 


To determine whether SUM44 cells are sensitive to growth 
inhibition by 4HT, cells were treated with antiestrogen as indicated 
for 6 d (Fig. IB, closed circles). 4HT significantly inhibits the 
proliferation of SUM44 cells (ANOVA P < 0.001). The observed 
reduction in cell number is also reflected in an inhibition of DNA 
synthesis as shown by reduced BrdUrd incorporation after 72 hours 
of 4HT treatment (ANOVA P < 0.001; Fig. 1C, closed circles), 
consistent with the known cytostatic effect of 4HT (12). 

Generation of a TAM-resistant SUM44 variant. Because ILCs 
are predominantly ER+ and TAM has been the most widely used 
endocrine agent for the treatment of ER+ breast cancer, we sought 
to develop a TAM-resistant ILC model using the SUM44 cell line. 
Cells were selected against increasing concentrations of 4HT, and 
the cell population proliferating in 500 nmol/L 4HT (within the 
range of clinically relevant concentrations; ref. 10) was designated 
SUM44/LCCTam (hereafter called LCCTam). 

The basal growth rate of LCCTam is identical to that of 
the parental SUM44 cell line and as expected, LCCTam cells 
are no longer responsive to the antiproliferative effects of 4HT 
(Fig. IB, open triangles, N.S.), and LCCTam DNA synthesis is no 
longer inhibited by 4HT (ANOVA/’ = 0.212; Fig. 1C, open triangles).'Io 
further confirm that differences in SUM44 and LCCTam cell 
proliferation in response to antiestrogen reflect changes 
in sensitivity to the cytostatic effects of 4HT, we performed cell 
cycle analysis. SUM44 cells treated with 1 pmol/L 4HT 
show a significantly greater fraction of cells arrested in the 
Gi phase compared with ethanol-treated controls {P < 0.001; 
data not shown), whereas 4HT no longer induces an accumulation of 
LCCTam cells in Gi {P = 0.722, data not shown). Together, these 
findings show that SUM44 ceU growth and ceU cycle progression are 
efficiently inhibited by 4HT, but that LCCTam cells have acquired 
resistance to the inhibitory effects of this antiestrogen. 

Changes in the transcriptome of LCCTam cells are not 
associated with chromosomal aberrations. To characterize 
further this novel ILC cell model, we determined the pattern of, 
and differences in, genomic alterations and gene expression 
between SUM44 and LCCTam ceUs using CGH and Affymetrix 
gene expression microarray analysis, respectively. The genetic 
lineage of the two cell lines was confirmed to be identical by DNA 
fingerprinting using genetic markers at nine different loci. CGH 
analysis revealed changes in the DNA copy number (gains, losses, 
and amplifications) in both SUM44 and LCCTam (Supplementary 
Fig. SI). Importantly, a comparison between our CGH findings and 
a previously reported CGH analysis of SUM44 show a similar 
pattern of aberrations (30). We found no significant difference in 
the pattern of chromosomal alterations between the two cell lines; 
acquired estrogen independence also is not associated with 
changes in the amplification of DNA sequences (31). 

In marked contrast, microarray analysis reveals a large number of 
changes in gene expression. We used PCA (27) to visuaUze the high- 
dimensional data set in two dimensions; SUM44 and LCCTam are 
linearly separable in this two-dimensional PCA projection based on 
the top two principal components that capture 95% of the 
cumulative variance in the data (Supplementary Fig. S2). Using a 
final cutoff of > 2-fold change with P < 0.05 (univariate, 
uncorrected, T-statistic), we find that 380 genes are likely to be 
significantly altered: expression of 91 genes are increased and 289 
genes are decreased in LCCTam versus SUM44 controls (Supple¬ 
mentary Table SI). 

To maintain focus on the TAM-resistant phenotype observed 
in LCCTam cells, we first chose to investigate gene expression 
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changes in ERs and other members of the nuclear receptor 
superfamily. Expression of ERa (HUGO symbol ESRl) is 
decreased 3.1-fold in LCCTam compared with SUM44 cells by 
microarray (P = 0.0013), which was subsequently confirmed by 
qPCR analysis (|,2.98-fold, P < 0.001; Fig. 2A, white bars). In 
contrast, expression of the orphan nuclear receptor ERRy 
(HUGO symbol ESRRG) is 4.4-fold increased in the resistant 
LCCTam cells by microarray (P = 0.01) and 10-fold increased by 
qPCR (P = 0.03; Fig. 2A, black bars). 

To confirm that differences in the mRNA expression of these 
receptors are maintained at the protein level, cell lysates were 
collected and analyzed for ERRy and ERa expression by Western 
blot (Fig. 2B, inset). As observed for mRNA, ERRy protein 
expression is increased (t2.5-fold; P = 0.03) and ERa expression 
is decreased (|,2-fold; P = 0.03) in LCCTam cells. We also examined 
the protein levels of aU other ERs and ERRs (ERp, ERRa, and 
ERRp) and find no differences in their expression between SUM44 
and LCCTam cells (Fig. 2C). 

ERRy plays a functional role in TAM resistance in LCCTam 
cells. ERRy is an orphan nuclear receptor with no known natural 
ligand that has been shown to have constitutive transcriptional 
activity at several DNA response elements (reviewed in refs. 32, 33). 
ERRy and its family members ERRal and ERR(3 bear some 
structural similarity to the ER (32, 34). Although ERRal has 
previously been shown to activate or repress estrogen response 
element (ERE)-mediated transcription depending on cellular 
context (34) and to participate in HER2-dependent signaling in 
BT474 breast cancer cells (35), the role of ERRy in breast cancer 
therapeutic response is underexplored (36). 

We hypothesized that if increased expression of ERRy in 
LCCTam cells performs a functional role in the acquired TAM 
resistance phenotype, knockdown of receptor expression should 
restore TAM sensitivity. LCCTam cells were transiently transfected 
with siRNA oligonucleotides directed against ERRy (siERRy) or a 
nonsilencing control (siC) before treating the cells with 4HT 
and assessing DNA synthesis as measured by BrdUrd incorpora¬ 
tion. A 2- to 3-fold decrease in ERRy expression is attained by 
siRNA (P < 0.001; Fig. 3A). Importantly, ERRy knockdown also 
partially restores sensitivity to 4HT in the LCCTam cells (P = 0.03 
versus siERRy ethanol and P < 0.001 versus siC in 1,000 nmol/L 
4HT; Fig. 3B) with no effect on the expression of ERa (Fig. 3A, 
inset, bottom). These data suggest that ERRy plays a key functional 
role in the LCCTam TAM resistance phenotype. 


Figure 1. SUM44 cell proliferation and ER transcriptional activity are inhibited 
by antiestrogens, and LCCTam cells have acquired resistance to TAM. A, cells 
were seeded in 12-well tissue culture dishes, transfected with plasmids encoding 
3xERE-tk-luciferase and phRL-SV40 Renilla, and treated with 10 nmol/L E2, 
1,000 nmol/L 4HT, 100 nmol/L Fulv, 4 HT-hE 2, Fulv-rE2, or ethanol control for 
24 h before han/est and luciferase assay. ERE-luciferase values are normalized 
to Renilla activity to obtain Relative Light Units, and data are presented as the 
mean relative to ethanol ± SE for a representative experiment performed in 
triplicate. ANOVA P < 0.001; P < 0.001 for comparisons to ethanol and 
P < 0.001 for comparisons to E2 by post hoc Student’s f test. B, cells were 
seeded in 24-well tissue culture dishes and treated with the indicated 
concentrations of 4HT for 6 d, at which time cell number was determined. Points, 
mean proliferation relative to ethanol for a representative experiment performed 
in quadruplicate; bars, SE. ANOVA P < 0.001 for SUM44, and not significant 
(N.S.) for LCCTam. C, cells were seeded in 96-well tissue culture dishes 1 d 
before treatment with the indicated concentrations of 4HT for a total of 72 h; 
BrdUrd was added for the last 18 h of culture. Points, mean BrdUrd incorporation 
relative to ethanol for a representative experiment performed in quintuplicate; 
bars, SE. ANOVA P < 0.001 for SUM44, and P = 0.212 (N.S.) for LCCTam. 


Overexpression of ERRy induces 4HT resistance. Next, we 
sought to determine whether ERRy overexpression could induce 
TAM resistance in endocrine-responsive breast cancer cells. SUM44 
cells grown on fibronectin-coated coverslips were transiently 
transfected with the pSG5-HA-ERR3 plasmid, encoding the murine 
homologue of ERRy, which is 100% identical to human ERRy at the 
amino acid level (19), or the empty vector (pSG5). Cells were then 
treated with 1 pmol/L 4HT or ethanol vehicle and immunostained 
for BrdUrd incorporation (green) and ERRy expression (HA, red-, 
ref. 21). In agreement with our results in Fig. 1C, 4HT significantly 
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Figure 2. ERa and ERR 7 mRNA and protein expression are significantly altered 
during the acquisition of TAM resistance. A, totai RNA was isoiated from SUM44 
and LCCTam ceils, reverse-transcribed, and subjected to RT-PCR to detect 
ERR 7 (HUGO gene symbol ESRRG), ERa {ESR1), and the housekeeping gene 
RPLPO. Columns, mean target gene/RPLPO ratio for three samples analyzed in 
triplicate; bars, SE. *P = 0.03 for ESRRG and *P < 0.001 for ESR1 in SUM44 
versus LCCTam by Student’s t test. B, densitometric quantification of protein 
expression and a representative Western blot are shown for expression of ERR 7 , 
ERa, and the GAPDH loading control. *P = 0.03 for ERR 7 and ERa in SUM44 
versus LCCTam by Student’s t test. C, representative Western blot showing 
ERRa, ERR( 5 , and ERp expression, and the GAPDH loading control, in SUM44 
and LCCTam cells. 


reduces BrdUrd incorporation in SUM44 cells transfected with the 
empty vector pSG5 (P < 0.001; Fig. 4A, ii versus iv, 48.9% versus 
16.9% BrdUrd incorporation). However, 4HT can no longer 
inhibit DNA synthesis when ERRy is overexpressed (P < 0.001; 
Fig. 4A, iv versus viii, 16.9% versus 53.9% BrdUrd incorporation). 
The effect of ERRy overexpression is particularly striking when 
comparing BrdUrd incorporation in transfected versus untrans¬ 
fected cells in the presence of 4HT within the same field of 


view. In Fig. 4A, most ERRy-positive (red) cells incorporate 
BrdUrd (viii, arrowheads), whereas ERRy-negative cells show 
little-to-no BrdUrd incorporation (viii, *). 

To confirm that ERRy can regulate TAM resistance in 
breast cancer cell lines other than SUM44, we performed the same 
study in MDA-MB-134 VI cells, which are ER-i- and TAM-sensitive 
(37) and are also considered to be of lobular origin (38). When 
transfected with the pSG5 empty vector, DNA synthesis in MDA-MB- 
134 VI cells is inhibited by 4HT by nearly 2-fold (49.9% versus 27.3% 
BrdUrd incorporation, P < 0.001; Fig. 4B). However, when ERRy is 
overexpressed, these cells become significantly less responsive to the 
inhibitory effects of 4HT (27.3% versus 44.7% BrdUrd incorporation, 
P = 0.001; Fig. 4B). Together, these data show that increased 
expression of ERRy can induce TAM resistance in several ER-i- 
lobular breast cancer cell hnes. 
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Figure 3. siRNA knockdown of ERRy in LCCTam cells restores TAM sensitivity. 
A, cells seeded in 12-well dishes were transfected with control (s/C) or 
ERRy-specific (siERRy) oligonucleotides (final concentration, 100 nmol/L) for 
72 h before lysis, Western blot analysis, and densitometry. Columns, mean 
ERRy/p-actin ratio for three independent experiments; bars, SE. Inset, a 
representative image. *P < 0.001 for siERRy versus siC, Student’s f test. 
ERRy knockdown has no effect on ERa expression (inset, bottom). B, cells 
seeded in 96-well dishes were transfected with siC or siERRy oligonucleotides 
24 h before treatment with 1,000 nmol/L 4HT or ethanol control. BrdUrd 
assays were performed as described above; columns, mean BrdUrd 
incorporation relative to ethanol for a representative experiment performed in 
quintuplicate; bars, SE. *P = 0.03 versus siERRy ethanol, and ''P = <0.001 
versus siC in 1,000 nmol/L 4HT by Student’s t test. 
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Figure 4. ERRy overexpression in SUM44 and MDA-MB-134 VI breast cancer 
cells induces TAM resistance. A, SUM44 cells were seeded on fibronectin-coated 
coverslips, then transfected with pSG5-HA-ERR3 {ERFh<) or empty vector 
ipSGS). Cells were treated with 1,000 nmol/L 4HT or ethanol control for a total 
of 48 fi; BrdUrd was added for fhe lasf 18 h of culfure before fixafion and 
processing for HA (ERR] ) and BrdUrd immunosfaining. Representafive 
phase-confrasf and fluorescent images are shown (red, ERRy; green, BrdUrd); 
arrowheads, ERRy-positive cells; *, ERRy-negative cells. Quantitative data 
are presented as the mean percent BrdUrd incorporation for a represenfafive 
experimenf in which 4 fo 5 microscopic fields (>500 fofal cells) were counfed 
per condition. ANOVA P < 0.001; P < 0.001 for pSG5-4HT versus pSG5+4HT, 
and P < 0.001 for pSG5+4HT versus ERRy+4HT by post hoc Studenf's t tesf. B, 
MDA-MB-134 VI cells were seeded, fransfecfed, drug treated, and stained as 
described in A. Columns, mean percent BrdUrd incorporation for a representafive 
experimenf in which 4 fo 5 fields (and >500 fofal cells) per condifion were counfed; 
bars, SE. ANOVA P < 0.001; *, P < 0.001 for pSG5-4HT versus pSG5-t4HT, 
andP = 0.001 for pSG5-t4HT versus ERRy-t4HT by post hoc Sfudent’s f fest. 


ERR 7 -associated transcriptional activity is increased in 
resistant LCCTam cells. A crucial difference between ERRy and 
liganded nuclear receptors like ERa is the regulation of their 
transcriptional activities. Whereas ERa is dependent on ligand 
for full activation, ERRy and the other members of this orphan 
family exhibit constitutive transcriptional activity. The ERRy 
DNA binding domain is ~64% identical to that of ERa (34). 
Consequently ERRy can bind to the same EREs as ERa, but it 
can also potently activate the steroidogenic factor-1 response 
element (SEIRE; ref. 32). Although none of the ERR family 
members are affected by E2 stimulation because their ligand 
binding domains cannot accommodate E2 binding (discussed in 
ref 34), ERRy transcriptional activity at EREs and SFlREs can be 
inhibited by 4HT (39, 40). In contrast, 4HT-bound ERRy acquires 
the ability to positively regulate transcription at API sites 
(reviewed in ref 34). 

To begin to understand the mechanism by which ERRy up- 
regulation confers resistance to LCCTam cells, we examined the 
activity of ERE-, SE1RE-, and API-driven luciferase promoter- 
reporter constructs transiently expressed in SUM44 and LCCTam 
cells (Fig. 5A). Luciferase expression controlled by the ERE and 
SFIRE response elements is significantly increased by 5- and 
3-fold, respectively, in LCCTam cells compared with SUM44 cells 
(P < 0.005). When LCCTam cells are cultured in 4HT (“LCCTam-i- 
4HT”), ERE-luciferase activity is somewhat reduced but still 
shows a nearly 2-fold increase relative to SUM44 (black bars-, 
P < 0.005), whereas SFlRE-luciferase activity remains high 
(white bars-, 3-fold above the levels in SUM44 cells; P < 0.005). 
In contrast, API-luciferase activity increases up to 8-fold that 
observed in SUM44 cells in the presence of 4HT (hatched bars-, 
P < 0.005). 

ERRy/APl activity seems to drive TAM resistance in 
LCCTam cells. To test whether the observed robust API activity 
plays a functional role in the TAM-resistant phenotype, we used a 
cell-permeable peptide fragment of c-JUN that blocks its 
interaction with the JUN NH 2 -terminal protein kinase, resulting 
in strong API inhibition (41). This c-JUN peptide has virtually no 
effect on SElRE-luciferase activity (Fig. SB, NS.) but can inhibit 
API-luciferase activity by >2-fold (P = 0.04; Fig. 5C). Importantly, 
this level of API inhibition restores 4HT-mediated growth 
inhibition to LCCTam cells (P = 0.001; Fig. 5D) and enhances 
the sensitivity of the parental SUM44 cells to the growth- 
inhibitory effects of 4HT (P = 0.002). 

Our functional data suggest that in LCCTam cells, increased 
ERRy-driven API transcriptional activity is most strongly associ¬ 
ated with TAM resistance. However, endogenous ERRy/APl target 
genes have yet to be identified; ERRy-dependent API activity has 
previously been reported only on heterologous promoter con¬ 
structs (42). We therefore used the TRANSFAC Professional 11.1 
database (43) to search the proximal promoter regions of genes up- 
regulated >2-fold in LCCTam cells for API consensus sites within 
5,000 bp of the start site. The MatchTM algorithm (44) was used to 
analyze the DNA sequences and search for potential API binding 
sites, using Position Weight Matrices to minimize false positives. 
Several genes had multiple API response elements in their 
promoter regions (Pig. 6A). Western blot analysis was then used 
to confirm the overexpression of two of these genes, HMGCS2 and 
FASN (Fig. 6B). HMGCS2 is a nuclear-encoded mitochondrial 
matrix gene that can regulate ketogenesis and cholesterol synthesis 
(45, 46), and FASN is the final enzyme of the fatty acid biosynthetic 
pathway (47). Components of all three processes have been 
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implicated in the etiology or progression of breast cancer, and 
FASN activity can affect hormonal sensitivity in breast and 
endometrial cancer cells (48-50). Therefore, we suggest that 
HMGCS2 and FASN may be two novel ERRy/APl targets in 
TAM-resistant breast cancer. 

Discussion 

In this study, we report the development of the first model 
of endocrine-resistant breast cancer in a cell line derived 
from an invasive lobular breast carcinoma, and show that the 


orphan nuclear receptor ERRy and its ability to drive API 
transcriptional activity are central to the TAM resistance 
phenotype. 

Selection of SUM44 cells against 4HT led to the establishment 
of the LCCTam cell line, which is stably resistant to TAM. In the 
resistant LCCTam cells, we observe a significant down-regulation 
of ERa (although they remain ER+), accompanied by a 
significant increase in the expression of ERRy. Resistance to 
antiestrogens has been hypothesized to take place through 
several diverse mechanisms (10, 12). One is loss or mutation of 
ERa, whereas others include alterations in the profile of 
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Figure 5. ERR^-associated transcriptional activity is increased in resistant LCCTam cells, and inhibition of API restores TAM sensitivity. A, cells were seeded and 
transfected with 3xERE-, 3xSF1 RE-, and 3xAP1-luciferase and phRL-SV40 Renilla, incubated, lysed, and analyzed as described for Fig. 1 A. LCCTam+4HT, cells that 
were cultured in SFIH containing 500 nmol/L 4HT. ANOVA P < 0.001, and *P < 0.005 for all comparisons versus SUM44 by post hoc Student’s t test. B, cells 
were seeded and transfected with 3xSF1 RE-luciferase and phRL-SV40 Renilla for 1 d before treatment with 20 pmol/L c-JUN peptide or PBS control (Ctrl.). Cells were 
then incubated, lysed, and analyzed as described for Fig. 1A; LCCTam+4HT is as described above. C, cells were seeded and transfected with 3xAP1-luciferase, then 
treated, harvested, and analyzed as described. LCCTam+4HT is as described above. ANOVA P < 0.001; *P = 0.04 for control versus c-JUN peptide by post hoc 
Student’s t test. D, cells were seeded in 96-well dishes 24 h before treatment with 1,000 nmol/L 4HT or ethanol control in the presence of 20 pmol/L c-JUN peptide or 
PBS control. BrdUrd assays were performed as described above; columns, mean BrdUrd incorporation relative to ethanol for a representative experiment performed in 
quintuplicate; bars, SE. *, P = 0.002 for SUM44, control versus c-JUN in the presence of 4HT, and P = 0.001 for LCCTam, control versus c-JUN in the presence 
of 4HT, by Student’s t test. 
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Figure 6. Genes overexpressed in LCCTam cells contain multiple API response elements in their promoter regions. A, genes overexpressed by >2-fold in 
LCCTam versus SUM44 cells in the microarrays were screened for the presence of API response elements in their proximal promoter regions using TRANSFAC 
Professional 11 .f. a, the number of consensus sites found within 5,000 bp of fhe franscripfional start site. B, representative Western blot showing HMGCS2 and FASN 
expression, and the GAPDH loading control, in SUM44 and LCCTam cells. 


hormone receptor coactivators and corepressors expressed by 
the tumor, differential metabolism of antiestrogens, and changes 
in the expression of additional genes that control cell 
proliferation and/or apoptosis (13). One or more of these 
mechanisms is likely contributing to the TAM-resistance 
phenotype of LCCTam cells. Relative to SUM44, the resistant 
LCCTam cells express 3-fold less ERa. However, SUM44 cells 
express high basal levels of ERa.® Consequently, the reduced 
level of ERa expression in LCCTam is comparable with that 
observed in MCF-7 breast cancer ceUs (~ 73% of basal MCF-7 
ERa levels by qPCR; data not shown). Because ERa levels in 
MCF-7 cells are clearly sufficient to confer antiestrogen 
sensitivity, it is unlikely that ERa down-regulation in LCCTam 
is the major determinant of resistance in this model. 

Our siRNA knockdown and cDNA overexpression studies are 
the first to show that ERRy is an essential regulator of TAM 
responsiveness in lobular breast cancer cells. Until now, the role 
of ERRs (and specifically ERRy) in breast cancer therapeutic 
response has not been well-understood. In 2002, Ariazi and 
colleagues (36) published a study of ERR family expression in 38 
breast tumors compared with normal mammary epithelial cells 
(MEC). ERRy mRNA expression is nearly 4-fold higher in breast 
tumors than in MECs and is positively associated with ER and 
PR expression. These authors conclude that the correlation of 
ERRy with ER and PR is indicative of a better prognosis (36). 
Although this is certainly plausible, the presence of ER and PR 
do not always indicate hormone sensitivity in breast cancer. As 
discussed above, TAM therapy is ineffective in ~ 30% of patients 
with ER+/PR+ breast tumors, and the majority of initial 
responders who acquire resistance to TAM and other endocrine 
agents do so without losing detectable ER expression (10). 4HT- 
bound ERRy is also known to activate transcription at API sites, 
and elevated API activity has previously been linked to TAM 
resistance in vitro (51, 52) and in vivo (53, 54). This is consistent 


^ DA. Zajchowski and S.P. Ethier, unpublished data. 


with our findings that API activity is robustly increased in the 
resistant LCCTam cells in the presence of 4HT, and that API 
inhibition reverses the TAM-resistant phenotype of LCCTam cells 
while increasing the sensitivity of SUM44 cells to growth 
inhibition by this antiestrogen. To our knowledge, this is the 
first functional consequence of ERRy-driven API transcriptional 
activity that has been reported. 

No endogenous ERRy/APl target genes have yet been 
identified. The genes in Fig. 6A are strong candidates as 
ERRy/APl targets in breast cancer. We confirmed the differential 
regulation of the endogenous HMGCS2 and FASN, and we 
propose that HMGCS2 and FASN are putative downstream 
targets of ERRy in the resistant LCCTam ceU line. Further 
assessment of their direct regulation by ERRy/APl is in progress. 
HMGCS2 is a nuclear-encoded mitochondrial matrix gene that 
can regulate ketogenesis and cholesterol synthesis (45, 46) and 
FASN is the final enzyme of the fatty acid biosynthetic pathway 
(47). Components of aU three processes have been implicated in 
the etiology or progression of breast cancer, and FASN activity 
can affect hormonal sensitivity in breast and endometrial cancer 
cells (48-50). Moreover, ERRy has been shown to control the 
switch from fetal use of carbohydrates to lipid-dependent 
oxidative metabolism in the adult mouse heart by regulating a 
series of genes that drive fatty acid oxidation, oxidative 
phosphorylation, and mitochondrial electron transport (55). That 
ERRy might also affect these metabolic processes in the context 
of breast cancer and TAM resistance is intriguing and will be the 
focus of future studies. Notably, this possibility is supported by a 
very recent publication by Montero and colleagues (56), which 
reports that increased mitochondrial cholesterol content pro¬ 
motes resistance to doxorubicin in hepatocellular carcinoma. 
Other genes in Fig. 6A also are of interest. High LRRC15 
expression has been previously linked to invasive and aggressive 
behavior in breast and prostate cancer (57, 58), and MSN is a 
marker of basal-like breast cancers (59); our future studies will 
also pursue the role(s) of these genes in endocrine-resistant 
breast cancer and their regulation by ERRy/APl. 
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ABSTRACT 

Motivation: Significant efforts have been made to acquire data 
under different conditions and to construct static networks that 
can explain various gene regulation mechanisms. However, gene 
regulatory networks are dynamic and condition-specific; under 
different conditions, networks exhibit different regulation patterns 
accompanied by different transcriptional network topologies. Thus, 
an investigation on the topological changes in transcriptional 
networks can facilitate the understanding of cell development or 
provide novel insights into the pathophysiology of certain diseases, 
and help identify the key genetic players that could serve as 
biomarkers or drug targets. 

Results: Here, we report a differential dependency network (DDN) 
analysis to detect statistically significant topological changes in 
the transcriptional networks between two biological conditions. We 
propose a local dependency model to represent the local structures 
of a network by a set of conditional probabilities. We develop an 
efficient learning algorithm to learn the local dependency model using 
the Lasso technique. A permutation test is subsequently performed 
to estimate the statistical significance of each learned local structure. 
In testing on a simulation dataset, the proposed algorithm accurately 
detected all the genes with network topological changes. The 
method was then applied to the estrogen-dependent T-47D estrogen 
receptor-positive (ER+) breast cancer cell line datasets and human 
and mouse embryonic stem cell datasets. In both experiments using 
real microarray datasets, the proposed method produced biologically 
meaningful results. We expect DDN to emerge as an important 
bioinformatics tool in transcriptional network analyses. While we 
focus specifically on transcriptional networks, the DDN method we 
introduce here is generally applicable to other biological networks 
with similar characteristics. 

Availability: The DDN MATLAB toolbox and experiment data are 
available at http://www.cbii.ece.vt.edu/software.htm. 

Contact: yuewang@vt.edu 


*To whom correspondence should be addressed. 


Supplementary information: Supplementary data are available at 
Bioinformatics online. 


1 INTRODUCTION 

Recent advances in high-throughput genomic technologies provide 
ample opportunities to study cellular activities at the individual gene 
expression and network levels, while also presenting new challenges 
for data analysis (Clarke et al., 2008). Discovering the mechanisms 
that orchestrate the activities of genes and proteins in cells remains 
one of the key goals of systems biology studies (Kitano, 2002). 
Several approaches have been proposed to model genetic regulatory 
networks (Li et al., 2008), such as Bayesian networks (Friedman, 
2004; Friedman etal., 2000; Flusmeier, 2003), probabilistic Boolean 
networks (Shmulevich et al., 2002), state-space models (Rangel 
et al., 2004) and network component analysis (Liao et al., 2003). 
These methods attempt to construct a static network that can explain 
various gene regulation programs. 

Flowever, genetic regulatory networks are context-specific and 
dynamic in nature (Beyer et al., 2007; Clarke et al., 2008). 
Under different conditions, different regulatory components and 
mechanisms are activated and the topology of the underlying gene 
regulatory network changes accordingly. For example, in response 
to diverse conditions in the yeast, transcription factors alter their 
interactions and rewire the signaling networks (Luscombe et al., 
2004). While the inference of transcriptional networks using data 
from composite conditions could sometimes be contradictory due 
to changes in the underlying topology, most network learning 
algorithms assume an invariant network topology (Friedman et al., 
2000; Rangel et al., 2004; Shmulevich et al., 2002). Therefore, 
some methods have been presented to learn condition-specific 
transcriptional networks in yeast (Kim et al., 2006; Segal et al., 
2003). It is important to focus on and examine the topological 
changes in transcriptional networks between disease and normal 
conditions or under different stages of cell development. For 
example, a deviation from normal regulatory network topology may 
reveal the mechanism of pathogenesis (Hood et al., 2004), and the 
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genes that undergo the most network topological changes may serve 
as biomarkers or drug targets. 

Several methods have been proposed to utilize network topology 
information to carry out various bioinformatics tasks. Liu et al. 
(2006) introduced a topology-based cancer classification method, 
where correlation networks were first constructed and later 
used to perform classification. Fuller et al. (2007) developed 
weighted gene co-expression network analysis strategies, via single 
network analysis and differential network analysis, to identify 
physiologically relevant modules. Qiu et al. (2005, 2007) proposed 
an ensemble dependence model to detect the dependence changes 
of gene clusters between cancer and normal conditions for cancer 
classification, and further extended the dependence model to 
dependence networks. Wei and Li (2007) introduced a Markov 
random field model for network-based analysis of genomic data 
that utilizes the known pathway structures to identify differentially 
expressed genes and sub-networks. 

In this article, we propose a differential dependency network 
(DDN) analysis to model and detect the statistically significant 
topological changes in transcriptional networks between two 
conditions. We use local dependency models to characterize the 
dependencies of genes in the network and represent local network 
structures. Local dependency models decompose the whole network 
into a series of local networks, which serve as the basic elements 
of the network used for statistical testing. Unlike other dependency 
models that consider only pairwise relationships (Choi et al., 2005; 
Fuller et al., 2007; Kostka and Spang, 2004; Watson, 2006) or 
binding triples (Qiu et al., 2007), the local dependency models 
select the number of dependent variables automatically by the Lasso 
method (Tibshirani, 1996), and thereby learn the local network 
structures. Subsequently, we perform permutation tests on the 
local dependency models under two conditions and assign the 
P-values to the local structures. It may seem straightforward to 
construct an entire network under each condition and compare 
the differences between the two networks (Fuller et al., 2007; 
Qiu et al., 2007). However, in realistic applications this approach 
runs into the difficulty that the network structure learning can be 
inconsistent with a limited number of data samples. The detection 
procedure proposed here assures the statistical significance of the 
detected network topological changes by performing a permutation 
test on individual local structures. We also pinpoint ‘hot spots’ in 
the network where the genes exhibit network topological changes 
between two conditions above a given significance level. Lastly, 
we extract and visualize the DDN, i.e. the sub-network showing 
significant topological changes. We demonstrate the usefulness of 
the proposed method on both simulated and real microarray data. 
Tested on a simulation dataset, the proposed algorithm accurately 
captured the genes with network topological changes. When applied 
to the estrogen-dependent T-47D estrogen receptor-positive (ER-I-) 
breast cancer cell line datasets and human and mouse embryonic 
stem cell (ESC) datasets, the DDN analysis obtained biological 
meaningful and promising results. 

2 METHODS 

2.1 Local dependency models 

Given a set of random variables X= {Xj ,X 2 , ... ,Xm], a dependency network 
for X is modeled by a set of local conditional probability distributions, one 


for each node given its parents, denoted as Z,, which satisfies 

P(X,|Z,)=P(X,|X_,) (1) 

where X_, = {Xi,X 2 ,...,X,_i,X,+i,...,X m) and Z,cX_,-. P(X,|Z,) also 
represents the local structure of node X/, i.e. the relationship of node Xj 
and its parents Z/ on the graph (Heckerman et al., 2000). 

Inspired by this formulation, we propose a local dependency model to 
describe the dependencies of genes in a transcriptional network. Unlike 
a conventional dependency network approach, where there is only one 
conditional probability distribution for each node given its parents, our local 
dependency model allows more than one conditional probability distributions 
for each node. Mathematically, suppose there are M genes in the network of 
interest, and the dependencies of gene i on other genes are formulated by a 
set of conditional probabilities, 

P, = {P(X.|Z, 1 ),P(X,-|Z,-, 2 ),...,P(X. iz,.,,)} , 1 = 1,2,... ,M (2) 

where Z/,i,Z/, 2 ,...,Z/,.y. are some subsets of X_/ and si is the number of 
conditional probabilities for random variable X/. We use X, to refer both 
to the expressions of gene i and to its corresponding node on the graph. 
This modification is primarily based on the following considerations. First, 
our goal is not to construct the entire network that represents the full joint 
distribution of all variables, rather we wish to model the local structures 
for further statistical testing. Second, many genes are highly correlated and 
the data points are very limited when extracting most biological networks. 
Through our experiments, we found that the conventional approach misses 
some meaningful dependency connections in data-sparse situations. For 
example, regulator genes R1 and R2 have the same target gene A, and 
the expression patterns of Rl, R2 and A are highly correlated. When the 
data points are few, the standard approach may only select one of the 
dependencies, for instance, gene A on gene Rl, even though the dependency 
of gene A on gene R2 is only slightly less significant than the dependency of 
gene A on gene Rl. However, the dependencies of gene A on genes Rl and 
R2 are both important, and we want to keep the rich structural information 
for later step to assess the topological changes. Therefore, to retain more 
meaningful local structure information, instead of selecting ‘the best’ local 
structure, we select a set of ‘sufficiently good’ local structures for further 
statistical testing. We achieve this goal by allowing each node to be modeled 
by more than one conditional probability distribution. 

2.2 Local structure learning 

The conditional probability distributions in Equation (2) can be inferred by 
regression methods. In our approach, we consider a linear regression model 
in which the variable X/ is predicted by a linear function of Z, 

X, = p'fZ,-|-£,, i=l,2,...,M (3) 

where Z/ e {Z,j ,Z,, 2 , •••,Z,^. 5 .) is a column vector of random variables, p is a 
column vector of unknown parameters. The random error £/ is independent of 
Zi and is assumed to have normal distribution N{0, erf-). The local conditional 
probability P(X, |Z,) is given by 

P(X,|Z,)=x(pTz.^(,2^ (4) 

Learning the structure of the local dependency model requires the selection 
of a Zi that shows good predictability of X, . Given a predefined maximum 
size of Z/, K, we examine all C '^_2 combinations of the elements in X_/ with 
size K. K can be empirically set to a positive integer between 1 and M—\. 
When K=\, the proposed local dependency model only considers pairwise 
relationships. When K = M—\, the proposed local dependency model is 
equivalent to standard dependency networks as described in Equation (1) 
(Heckerman et al., 2000). 

Suppose one X-combination of X-_, is {Xk-^,Xk 2 ,---,Xki^], where 
k\ ,k 2 ,... ... ... ,M], and there are N expression 

samples. Lower case letter Xi{j) denotes the y-th sample value taken by the 
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variable Xi, j=\,2, ...,N. We perform a Li constrained regression of Xj on 


tasso = argmin 11] (W “ W 

[j=l \ 1=1 


K 

1=1 


(5) 


Equation (5) is known as the Lasso estimator (Tibshirani, 1996), which 
minimizes L 2 norm loss with constraint on the L\ norm of p. The nature of L\ 
constraint tends to make some coefficients in pL^ggQexactly zero, and hence 
it automatically selects a subset of features and leads to a simpler model that 
avoids overfitting the data, and therefore usually has better generalization 
performance. The parameter t>0 controls the amount of shrinkage that is 
applied to the estimates. In our software implementation, parameter t is 
determined by 5-fold cross-validation. Solving Equation (5) is a convex 
optimization problem, and can be solved very efficiently. We adopt the 
least angle regression (LARS) method to solve this problem. The detailed 
procedure of LARS can be found in Efron et al. (2004). 

We also use a prescreening strategy to release the computational burden. 
We first regress Xj on Z, = {Xk^ , Xk 2 , • • •, Xk^^ }, using the ordinary least square 
method 


\ ^ ( 


Pols = argmin | L E' W “ 


1;=1 V 



If the corresponding mean square error (MSE) is above a predetermined 
threshold T, which means Xj cannot be accurately predicted by the subset 
{X}i^,Xk 2 ^---^Xkf^], the subset {Xk^,Xk 2 ^---,Xkf^] will be discarded. If the 
MSE is below T, we will then perform the L{ constrained regression of Xj. 

We perform the above prescreening and local structure learning with 
the Lasso on each of -combinations of X_,, and obtain predictor 
sets Z,,i,Z,, 2 ,and the conditional probability distributions 'Pj = 
{n^HZ^l),/^(^/|Z/.i),...,P(X^Z,■,-)} for node 

To measure how well variables Z/ can predict Xj, or how well the local 
dependency model fits gene expression microarray data, we further introduce 
the definition of coefficient of determination (COD) 


COD- (zQI 

var{Xi] 

where var{-} is the estimate of the variance of the random variable and 
fxi\Zi(-) is the best function in a given function class that minimizes the 
residual variance. COD has been successfully used in non-linear signal 
processing and probabilistic Boolean network inference (Shmulevich et al., 
2002). Here we only use linear functions, and var{X/—/}^-|Z,(Z,)} is an 
estimate of af in Equation (4). 


2.3 Detection of statistically significant topological 
changes 

To detect the statistically significant network topological changes between 
two experimental conditions, we assume there are M genes in the network 
of interest, and N\ samples from condition 1 and N 2 samples from 
condition 2. We further denote the datasets from two conditions by = 
[xCm)^!) x('«)( 2 ),...,x^'”^(Am)], where superscript (m) indicates condition m, 
m= 1,2. The bold face lower case letter denotes the column vector 

, where lower case letter denotes the 7 -th 

sample value taken by variable Xj under condition m. 

By applying the learning procedure to datasets and respectively, 
we obtain = {P(A, |ZjV),/’(A,|Z| 2 )^ ■■■ ’^(-^/|Z^^ n))} under condition 1 

’ ’ i.Sj 

and 'P® = {P(Aj|Z?^^),P(Aj|Z? 2 ),...,P(A,|Z^^^^ 2 ))} under condition 2 for 

each node i, i=l,2,...,M. Then we take the union of the local structures 
learned under two conditions 

Vi = Vf'*UVf\ i = ( 8 ) 

for further statistical testing. 


For each conditional probability distribution in 7^,, z = 1,2,...,M, 
for instance, P(Xi\Zj)e'Pi, we perform a permutation test to assess 
how significantly it is different between two conditions. Given samples 
= 1 , 2 ,under the first condition and 
= 1,2,... ,N 2 } under the second condition, we 
calculate COD^^> and COD^^)^ using Equation (7). A test statistic 6 is defined 
by the absolute difference of the coefficients of determination under two 
conditions 

0 = |COD<‘>-COD®| (9) 


We want to test the null hypothesis, Hq, of no difference between 
and P®(X,|Z,). We first combine = 1,2,...,Wi) 

and {[x®(/®),zP(/®)]^,j® = 1 , 2 ,...,A^ 2 ), and then randomly permute 
samples from two conditions and divide the data into two sets of N[ and N 2 
samples, respectively. We perform the above procedure B times, where B is 
set to 5000 in our software implementation, and calculate 0^, b= 1,2,...,B 
according to Equation (9). An estimate of the achieved significance level 
(ASL) of the test is 


ASL= 


B 

El 

b=l 


{o;>e} 


B 


( 10 ) 


where the random variable 0^ is generated by permutation and 
denotes the indicator function, which takes 1 when 0^>0 and 0 otherwise. 
The smaller the value of ASL, the stronger the evidence against Hq is. 
Equation (10) also is an estimate of the P-value. The detailed permutation 
procedure is described in Efron and Tibshirani (1993). This detection 
procedure is performed on every local structure in 'Pj,i=l,2,...,M, and 
each local structure is assigned a P-value. 


2.4 Identification of the ‘hot spots’ in the network and 
extraction of the DDN 

Given a user-defined P-value cutoff, we obtain a set of statistically significant 
differential local structures. The nodes in these differential local structures 
are identified as ‘hot spots’ in the network, which are the genes undergoing 
topological changes defined by a specified significance level. These genes 
may correspond to the genes in disease- or process-related pathways. 

DDN is the focused sub-network that exhibits the topological changes. 
We consider a connection to exist from each element in Z, to Xj under 
one specific condition if the variance of P(X, |Z,) is below the user-defined 
threshold T for that condition (see Supplementary Material for discussions on 
the selection of T). We use different colors to represent connections appearing 
under different conditions. DDN provides a way to visualize the topological 
changes, and when applied to disease studies, DDN extracts and focuses on 
the disease-related pathways that may contribute to the understanding of the 
mechanism of the disease. 


3 RESULTS 

3.1 A simulation experiment 

We first used the software SynTReN (Van den Bulcke el al., 2006) 
to generate one simulation dataset of a sub-network drawn from an 
existing signaling network in Saccharomyces cerevisiae. Then we 
changed part of network topology and used SynTReN to generate 
another dataset according to this modified network. The network 
topology under two conditions is shown in Figure 1. The network 
contains 20 nodes that represent 20 genes. The black lines indicate 
the regulatory relationships that exist under both conditions. The 
red and green lines are the regulatory relationships that only exist 
under conditions 1 and 2, respectively. The sub-network comprised 
of nodes MBP1_SWI6, CLB5, CLB6, PH02, FLOl, FLOlO and 
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Fig. 1. The network topology under two conditions in the simulation 
study. Nodes in the network represent genes. Lines in the network indicate 
regulatory relationships between genes. The black lines are the regulatory 
relationships that exist under both conditions. The red and green lines 
represent the regulatory relationships that only exist under conditions 1 and 
2, respectively. The DDN between the two conditions is the sub-network 
comprised of nodes MBP1_SWI6, CLB5, CLB6, PH02, FLOl, FLOlO and 
TRP4 and green and red lines. 


Table 1. ‘Hot spots’ identified by DDN analysis in simulation study 


Gene 

Fold change 

P-value 

(r-test) 

Gene 

Fold change 

P-value 

(r-test) 

CLB5 

8.92E-01 

6.92E-043 

MBPl SWI6 

3.74E-02 

4.58E-001 

CLB6 

4.79E-02 

2.00E-001 

PH02 

L54E-01 

1.34E-005 

FLOl 

-5.04E-02 

4.44E-001 

SWI4 

-L23E-01 

3.64E-007 

FLOlO 

7.73E-01 

3.52E-024 

TRP4 

8.26E-02 

LOOE-002 


TRP4 and green and red lines is the DDN that our algorithm tries to 
identify from expression data. 

The parameters for our algorithm are: threshold T is 0.25, P-value 
cutoff is 0.01 and the maximum size of Z, , K, is 2. Table 1 presents 
the ‘hot spots’ identified by the DDN analysis. Table 1 also shows 
the fold-changes of individual genes (after base 2 logarithm), and 
P-values of f-tests of individual genes. Our algorithm picked up 
all genes involved in topological changes, including some genes 
that did not show a significant difference in fold-change or f-tests, 
such as CLB6, FLOl and MBP1_SWI6. This indicates that our 
algorithm can successfully detect these interesting genes using their 
topological information, even though the means of their expressions 
did not change substantially between the two conditions. Therefore, 
this method is able to identify biomarkers that cannot be picked 
up by traditional gene ranking methods, providing a complimentary 
approach for biomarker identification problem. 

Figure 2 shows the DDN between the two conditions extracted 
by the proposed algorithm. The DDN shows network topological 
changes and the genes involved therein. The red lines in Figure 2 
represent the connections that exist only under condition 1, and the 
green lines represent the connections that exist only under condition 
2. Compared with the known network topology shown in Figure 1, 
the proposed algorithm correctly identified and extracted all the 
nodes with topology changes and 9 of 10 differential connections, 
with only the connection between PH02 and TRP4 under condition 1 
falsely missed, and the connection between PH02 and SWI4 under 



Fig. 2. The DDN extracted by the proposed algorithm in the simulation 
study. The red lines represent the connections (dependencies) that only 
exist under condition 1, and the green lines represent the connections 
(dependencies) that only exist under condition 2. The proposed DDN analysis 
successfully detected 9 of 10 connections that are different between two 
conditions and all the genes involved in the network topology changes. The 
connections between PH02 and SWI4 under condition 1 (red) and between 
MBP_SWI6 and SWI4 under condition 2 (green) were falsely detected and 
the connection between PH02 and TRP4 under condition 1 (red) was falsely 
missed. 

condition 1 and the connection between MBP1_SWI6 and SWI4 
under condition 2 falsely detected. 

3.2 Breast cancer dataset analysis 

We applied our method to the dataset from an ER-l- breast cancer cell 
study by Lin et al. (2004). In that dataset, the estrogen-dependent 
T-47D ER-l- breast cancer cell line was treated with 17/1-estradiol 
(E2) and with E2 in combination with the pure anti-estrogen ICI 
182 780 (ICI, Faslodex, Fulvestrant). Samples were then harvested 
on an hourly basis for the first 8 h (0-8 h) and bi-hourly for the next 
16 h (10-24 h) for a total of 16 time points under each condition. 
Experiments were performed on microarrays generated by spotting 
the Compugen 19 K human oligo library, made by Sigma-Genosys, 
on poly-L-lysine-coated glass slides. 

Here we are interested in the cellular response to the drug ICI, 
which inhibits E2 signaling through the ER (Howell, 2006). We first 
selected 55 genes that are reported in the literature to be relevant to 
breast cancer and responsiveness to ICI (for example, Kuo, 2007; 
Riggins et al., 2005, 2007). We then applied our DDN analysis to 
the data under two conditions (E2 versus E2-I-ICI). The parameters 
in our algorithm are: threshold T is 0.25, R-value cutoff is 0.01 and 
K is 2. 

Table 2 lists the genes that exhibit significant topological changes 
in the network identified by DDN analysis. The DDN under these 
two conditions is shown in Figure 3. The genes identified by 
the proposed algorithm and their expression results (Table 2) are 
consistent with published data. For example, XBPl and BCL2 show 
strongly decreased expression in response to E2-I-ICI relative to E2 
alone, and both of these genes are known to be induced by E2 
(Compel et al., 2000; Tozlu et al., 2006; Wang et al., 2004). 

In Figure 3, there are 18 red connections in the DDN, which 
implies that these connections exist only under E2 condition and 
disappear after the addition of ICI. Since ICI 182 780 is an ER 
antagonist, which works both by downregulating and by degrading 
the ER-alpha, it is plausible that these connections disappear because 
ICI is blocking or inactivating their connections. For example, as 
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Table 2. ‘Hot spots’ identified by DDN analysis in breast cancer study 
(see Supplementary Material for gene annotations) 


Gene 

Fold change 

P-value (Mest) 

Gene 

Fold change 

P-value (r-test) 

ABCBll 

2.05E-01 

8.48E-01 

ESR2 

-3.72E-01 

8.74E-01 

AKTl 

-4.02E+00 

3.92E-01 

F12 

4.10E+00 

3.81E-01 

BCAR3 

-4.92E-01 

9.37E-01 

HOXAIO 

4.14E+00 

7.89E-01 

BCL2 

-2.46E+00 

2.62E-01 

MAPKl 

-1.35E+00 

6.09E-01 

BIK 

-2.75E+00 

7.52E-01 

MARK 13 

2.81E+00 

5.01E-01 

BIRCl 

5.98E-01 

8.67E-01 

MARK 14 

-1.12E-01 

9.35E-01 

BIRC3 

2.66E+00 

5.60E-01 

MARK3 

2.42E-01 

9.67E-01 

CAV3 

4.12E+00 

7.92E-01 

MAPK4 

1.55E+00 

4.84E-01 

CGA 

4.19E+00 

7.00E-01 

MAPK8 

-6.73E+00 

1.65E-01 

COX7A2L 

3.94E+00 

2.32E-01 

NFKBl 

3.91E-02 

9.88E-01 

EBAG9 

2.04E+00 

6.76E-01 

NFKB2 

-9.15E-02 

6.92E-01 



Fig. 3. DDN between breast cancer cell line treated with E2 and cell line 
treated with E2+ICI. The red lines represent the connections that exist only 
in breast cancer cell line treated with E2, and the green lines represent the 
connections that exist only in breast cancer cell line treated with E2+IC1. 

a transcription factor, XBPl can directly regulate gene expression 
through binding to its response element (Iwakoshi et al., 2003), or it 
can act as a co-regulator of other transcription factors, most notably 
ER-alpha, to enhance their transcriptional activity (Ding el al., 2003; 
Fang et al., 2004). Because BCL2 contains response elements for 
both ER-alpha and XBPl (Gomez et al., 2007; Somai et al., 2003), 
the connection between XBPl and BCL2 in the DDN may either 
be direct or involve ER-alpha as a latent variable, or intervening 
gene. In direct support of this predicted edge, we have shown that 
constitutive overexpression of XBPl in a different breast cancer 
cell line (MCF-7) led to significantly increased mRNA and protein 
expression of both ER-alpha and BCL2 and functionally conferred 
antiestrogen resistance and estrogen-independence (Gomez el al., 
2007). 

Novel relationships between these genes identified by our DDN 
analysis will also serve as useful guidance for future studies. For 
example, BCAR3 is a well-established effector of cell motility, 
estrogen independence and antiestrogen resistance in ER-l- breast 
cancer cell lines (Felekkis el al., 2005; Riggins et al., 2003; 
Schrecengost et al., 2007; Van Agthoven el al., 2006). Expression 
of NFKB2 and its activator BCL3 are also associated with estrogen 
independence in breast cancer cell lines (Pratt el al., 2003), and 
these nuclear factor k B subunits appear to be selectively activated 


in clinical breast cancer (Pratt et al., 2003). However, there is 
no experimental evidence linking BCAR3 with NFKB2, so the 
suggestion that these two genes exhibit differential dependence 
under E2-treated conditions (Fig. 3) provides a starting point for 
biological studies of their relationship. 

Additional relationships that may be completely new to breast 
cancer are also identified by this method. For example, MAPK8 
(also known as JNKl) has been shown to be activated by BIRCl 
(also known as NAIP) during its inhibition of caspase-mediated cell 
death (Sanna et al., 2002). In chronic fatigue syndrome, growth 
factor receptor signaling can activate MAPK4, which via Ras 
and/or PI3K can subsequently increase AKTl activity (Englebienne 
and Meirleir, 2002). And finally, in B cells from patients with 
chronic lymphocytic leukemia NFKB1 (p50) homodimers are able 
to stimulate transcription from the BCL2 promoter through binding 
to another member of the BCL family (BCL3) (Viatour et al., 2003). 


3.3 Human and mouse ESC analysis 

ESCs can either maintain their pluripotency by self-renewal or 
undergo differentiation. The molecular mechanisms controlling ESC 
self-renewal and differentiation are complex and poorly understood 
(Sun et al., 2006; Zhan, 2008). ESCs harvested from different 
species show common characteristics, yet significant differences 
exist. Thus, cross-species analysis may help to distinguish between 
fundamental and species-specific mechanisms regulating ESC 
development (Sun et al., 2007; Zhan et al., 2005). Network biology 
can provide a new avenue for exploring ESC biology (Barabasi and 
Oltvai, 2004). Here, we used our new algorithm to conduct a human- 
mouse comparative analysis of ESCs, identifying evolutionarily 
divergent sub-networks. We focused our analysis on the cell cycle, 
a critical process for controlling cell development. In this study, 
58 cell-cycle genes were selected for the DDN analysis. The 58 
genes are the core components of the cell cycle machinery, and 
are orthologous between human and mouse cells. The expression 
profile data for these genes were determined from 18 samples from 
human ESCs and their earliest differentiation counterparts, embryoid 
bodies (EBs) and 18 samples from mouse ESCs and EBs, so that our 
inferred networks were directly related to ESC differentiation. The 
human ESC and EB expression data were determined from BGOl, 
BG02 and BG03 cell lines in our previous studies using Illumina’s 
BeadArrays (Liu et al., 2006), and from HI (Sato et al., 2003) and 
HES2 (E-MEXP-303 of the ArrayExpress database) cell lines using 
Affymetrix chips. The mouse ESC and EB expression data were 
determined from V6.5 (GSE3231 of GEO database), R1 (GSE2972) 
and J1 (GSE3749) cell lines, based on Affymetrix chips. The final 
datasets contained 9 ESC and 9 EB (14-day differentiated) samples 
from human and mouse cells, respectively. In the network analysis, 
we set K to 1, and threshold T to 0.2 and P-value cutoff to 0.01. 

Figure 4 shows DDN of the cell cycle between human and mouse 
cells (see Supplementary Material for gene annotations). The red 
lines represent the gene connections in human, and the green lines 
represent the connections in mouse. As shown, CDC25C, DUSPl 
and BUB 1 exhibit high connectivity on the network of human cells. 
On the other hand, PLKl, CDK2AP1, CDC20, TFDPl and CDC5L 
showed a high connectivity on the network of mouse cells. These 
results suggest evolutionary divergence across species during ESC 
development and may provide clues for insights into species-specific 


530 





DDN analysis 



Fig. 4. DDN between human and mouse ES/EB cells. The red lines represent 
the connections that exist only in human ES/EB cells, and the green lines 
represent the connections that exist only in mouse ES/EB cells. 

mechanism of the cell cycle in controlling ESC self-renewal and 
differentiation. 

4 DISCUSSIONS 

In this article, we propose a systematic approach to detect the 
statistically significant changes in transcriptional networks between 
two different experimental conditions. We tested our algorithm 
on simulation data, breast cancer data and ESC data. From the 
simulation study, we see that the proposed algorithm can capture 
the topological changes efficiently and accurately, even when the 
fold change of the expression values of each gene is not statistically 
significant. This approach utilizes the network structure information 
and provides an alternative way for biomarker identification. 
In addition, as knowledge of cellular networks accumulates, many 
biological databases will expand to contain more useful information. 
The proposed approach is an open framework, into which biological 
knowledge in specific applications can be easily incorporated as the 
local structure learning constraints. 

The high level of correlation among genes is a common feature of 
microarray data. Therefore, we propose a local dependency model 
that allows multiple predictor sets for each node. Accordingly, 
a local structure learning algorithm is also represented. Lasso is 
used to select features for the predictor sets (Tibshirani, 1996), 
an approach that has been successfully applied to variable selection 
and graph structure learning (Meinshausen and Buhlmann, 2006). 
In the linear Gaussian case, under certain conditions, it is proved that 
the probability of estimating the correct neighborhood converges 
exponentially to 1, and as a consequence it is possible to obtain 
a consistent estimation of the full edge set (Meinshausen and 
Buhlmann, 2006). However, in microarray data, the so-called 


irrepresentable condition (Zhao and Yu, 2006) or the neighborhood 
stability assumption (Meinshausen and Buhlmann, 2006) can easily 
be violated in the presence of highly correlated genes. Some 
modified algorithms have been proposed to deal with the highly 
correlated cases, for example, elastic net (Zou and Hastie, 2005) 
and network-constrained regularization (Li and Li, 2008), both of 
which tend to group highly correlated predictors in the regression 
process. However, these two approaches are not suitable for our 
problem, because the grouping of highly correlated variables can 
be different under two conditions and this makes the later statistical 
testing problematic. The local structure learning algorithm proposed 
here attempts to alleviate the effects of the highly correlated gene 
expression data and to preserve local structure information for 
further statistical testing. 

Some issues are worth further exploration. In this article, only 
linear relationships are considered. How non-linear relationships 
should be modeled efficiently and correctly, remains a difficult 
problem. Second, since many cellular reactions take place in the 
genome, transcriptome and proteome, it is essential to construct 
pathways by integrating data from heterogeneous sources. 

In sum, this article presents a new approach to extract knowledge 
of a biological network by emphasizing the dynamic nature of 
cellular networks and utilizing a network’s structural information. 
It also provides an alternative and promising approach to identify 
possible biomarkers and drug targets. 
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Resistance to endocrine therapies, whether de novo or acquired, remains a major limitation in the abil¬ 
ity to cure many tumors that express detectable levels of the estrogen receptor alpha protein (ER). 
While several resistance phenotypes have been described, endocrine unresponsiveness in the context of 
therapy-induced tumor growth appears to be the most prevalent. The signaling that regulates endocrine 
resistant phenotypes is poorly understood but it involves a complex signaling network with a topology 
that includes redundant and degenerative features. To be relevant to clinical outcomes, the most pertinent 
features of this network are those that ultimately affect the endocrine-regulated components of the cell 
fate and cell proliferation machineries. We show that autophagy, as supported by the endocrine regulation 
of monodansylcadaverine staining, increased LC3 cleavage, and reduced expression of p62/SQSTlVn, plays 
an important role in breast cancer cells responding to endocrine therapy. We further show that the cell 
fate machinery includes both apoptotic and autophagic functions that are potentially regulated through 
integrated signaling that flows through key members of the BCL2 gene family and beclin-1 (BECNl). This 
signaling links cellular functions in mitochondria and endoplasmic reticulum, the latter as a consequence 
of induction of the unfolded protein response. We have taken a seed-gene approach to begin extracting 
critical nodes and edges that represent central signaling events in the endocrine regulation of apoptosis 
and autophagy. Three seed nodes were identified from global gene or protein expression analyses and sup¬ 
ported by subsequent functional studies that established their abilities to affect cell fate. The seed nodes of 
nuclear factor kappa B (NFkB), interferon regulatory factor-1 (IRFl), and X-box binding protein-1 (XBPl) 
are linked by directional edges that support signal flow through a preliminary network that is grown 
to include key regulators of their individual function: NFMO/lKK"y, nucleophosmin and FR respectively. 
Signaling proceeds through BCL2 gene family members and BFCNl ultimately to regulate cell fate. 

® 2009 FIsevier Ltd. All rights reserved. 


1. Introduction 

Over 40,000 American women die of breast cancer each year [ 1 ] ; 
incidence is broadly similar across the European Union when con¬ 
sidered as a percentage of the population. In 2008, over 178,000 
women will be diagnosed with invasive breast cancer in the 
US., almost 70% of which will be estrogen receptor-a positive 
(ER+; HUGO Gene Symbol = ESR1) [2,3], The percentage of ER+ 
sporadic breast cancers increases linearly with age but even in pre¬ 
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menopausal cases the proportion is high; 62% at age <35 and 72% 
by age 49 [2-4], Data from randomized trials and meta-analyses 
clearly show that all breast cancer patients derive a statistically 
significant survival benefit from adjuvant chemotherapy, and that 
all hormone receptor positive breast cancer patients benefit from 
adjuvant endocrine therapy [5-9], For postmenopausal women, the 
benefit from adjuvant Tamoxifen (TAM) is comparable to that seen 
for cytotoxic chemotherapy. While 5 years of adjuvant TAM pro¬ 
duces a 26% proportional reduction in mortality [8], many ER+ 
tumors eventually recur [10], Since advanced ER+ breast cancer 
largely remains an incurable disease, resistance to endocrine ther¬ 
apies is a significant clinical problem. 

Endocrine therapy is administered as an antiestrogen (AE) like 
Tamoxifen (TAM) or Fulvestrant (FAS; Faslodex; ICl 182,780), or as 
an aromatase inhibitor (AI) such as Letrozole or Exemestane. It is 
less toxic and potentially more effective therapy in the management 
of hormone-dependent breast cancers. Antiestrogens, and TAM in 
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particular, have been the “gold standard” first line endocrine ther¬ 
apy for over 30 years [11 ], clinical experience with this drug likely 
exceeding over 15 million patient years [10], TAM increases both 
disease free and overall survival from early stage breast cancer, and 
it also reduces the incidence of invasive and noninvasive breast can¬ 
cer in high-risk women [8,9], Raloxifene, another antiestrogen, is 
effective in reducing the rate of postmenopausal bone loss from 
osteoporosis as well as the rate of invasive breast cancer [12], Newer 
antiestrogens such as FAS show significant activity relative to TAM 
and some Als [13,14], Third generation AIs are now widely accepted 
as viable alternatives to AEs for first line endocrine therapy in post¬ 
menopausal women with metastatic disease; overall response rates 
are generally greater for Als [15[. Importantly, Tamoxifen is the only 
single agent with demonstrated efficacy in both premenopausal 
and postmenopausal women with invasive breast cancer. Other AEs 
and all of the Als require the complete cessation of ovarian function. 

Of current interest is identification of the optimum choice and 
scheduling of AEs and Als. Evidence clearly shows improvements 
in disease free survival for combined adjuvant therapy (an A1 and 
an AE usually given sequentially) over single agent TAM [16-20]. 
However, the ability of Als to induce a significant improvement in 
overall survival compared with 5 years of TAM alone is uncertain 
[ 15], In terms of metastatic disease, recent data imply that response 
rates with an AI are either equivalent with or higher than with 
TAM [21,22]. Given the increasing number of endocrine treatment 
options, there is a clear need to optimize the selection and schedul¬ 
ing of agents for both early stage and advanced disease. Whichever 
way these controversies are eventually resolved, it is clear that both 
AIs and AEs will remain as key modalities in the management of ER+ 
breast cancers. Unfortunately, the inability of endocrine therapies 
to cure many women with ER+ disease will also remain. 

3.1. Endocrine resistance: receptor phenotypes 

Several resistance phenotypes are evident from both experi¬ 
mental models and clinical observations. The two primary receptor 
phenotypes are ER+ and ER-. These receptor-based phenotypes 
have been further stratified by addition of the estrogen-regulated 
receptor for progesterone (PGR; HUGO Gene Symbol = PGR). The 
degree of treatment benefit from endocrine therapy varies accord¬ 
ing to receptor phenotype. For example, approximately 75% of 
ER+/PGR+, 33% of ER+/PGR-, and 45% of ER-/PGR+ cases of 
metastatic breast cancer respond to TAM [10], Endocrine responses 
in truly ER- tumors are probably relatively rare and of uncertain 
relevance, as they most likely reflect incorrect assessments of what 
may be very low ER and/or PGR expression values. Data from the 
Early Breast Cancer Trialists’ Collaborative Group meta-analyses 
show that TAM therapy generates a non-significant 6% reduction in 
the 10-year risk of recurrence. A non-significant increase in the risk 
of death from any cause in patients with ER— breast cancer also was 
reported [8,9]. The real value of PGR, which is the only modification 
to this clinical prediction scheme for directing endocrine therapy to 
become routine in over 30 years (the value of directing endocrine 
therapy based on HER2 is still controversial), is largely limited to 
ER- tumors. It is general practice in the United States to treat all ER+ 
and/or PR+ invasive breast tumors with endocrine therapy. How¬ 
ever, it remains impossible to predict whether an individual patient 
will receive benefit from treatment and the magnitude or dura¬ 
tion of any benefit. Better predictors of each individual patient’s 
endocrine responsiveness are clearly needed. 

3.2. Endocrine resistance: pharmacological phenotypes 

Several pharmacological phenotypes have been identified in 
experimental models of either human breast cancer cells growing 
in vitro or of xenografts in immune-deficient rodents [10[. These 


phenotypes include (i) estrogen-independent (which appears 
equivalent to AI resistance but is not so for antiestrogen resistance 

[23] — some breast cancers can become resistant to an AE but still 
respond to an AI and vice versa); (ii) estrogen-inhibited (recently 
identified in MCF-7 models [24]); (hi) TAM-stimulated (identi¬ 
fied first in MCF-7 xenografts [25,26]); TAM-unresponsive but FAS 
sensitive [27] (identified first in MCF-7 models and subsequently 
observed in clinical trials [13]); TAM and FAS crossresistant [28] 
(perhaps this is truly antiestrogen crossresistant and it is seen both 
clinically in patients and experimentally in MCF-7 models [13,29]). 
Other variations on these phenotypes likely occur but are beyond 
the scope of our discussion. 

3.3. Clinical evidence for the prevalence of pharmacological 
resistance phenotypes 

Obtaining direct clinical evidence for the prevalence of each of 
the pharmacological resistance phenotypes is challenging. We have 
previously noted the utility of applying clinical responses to TAM 
withdrawal in metastatic breast cancer as one means to define, 
at least in broad terms, the likely relevance of a series of phar¬ 
macological phenotypes [29[. This approach is somewhat limited, 
as the number of cases across all studies is modest (n = 241). Fur¬ 
thermore, TAM withdrawal responses cannot readily distinguish 
between TAM-stimulation and estrogen-inhibition because each 
should predict for a clinical benefit. The latter would induce a bene¬ 
fit because many breast cancers contain significant concentrations 
of 17p-estradiol, independent of both menopausal and ER/PGR sta¬ 
tus [10], sufficient to produce the estrogen-inhibited phenotype 

[24] . Indeed, the superiority of Als over TAM in inducing clinical 
response strongly implies that over 75% of ER+/PGR+, at least 50% 
of all ER+ breast cancers irrespective of PGR expression, and 45% or 
more of ER-/PGR+ breast tumors are probably driven by adequate 
access to estrogen. 

In our prior assessment, almost 9% of patients received an overall 
clinical response to TAM withdrawal (partial responses + complete 
responses). When disease stabilizations were included we esti¬ 
mated that less than 20% of patients received clinical benefit [29], 
suggesting that the sum of TAM-stimulated plus estrogen-inhibited 
clinical phenotypes may not account for the majority of resis¬ 
tant phenotypes in women. Of course, given the number of ER+ 
breast cancers arising every year, these phenotypes are relevant to a 
notable number of women. The major response to TAM withdrawal 
was clinically detectable disease progression - greater than 80% of 
cases - strongly implicating unresponsiveness as the primary clini¬ 
cal resistance mechanism to TAM. Whether these breast cancers are 
fully crossresistant to all endocrine therapies, or retain sensitivity 
to AIs, cannot be determined from this simple analysis. 

Nomura et al. [30] took a different approach and assessed the 
responsiveness to estrogen and TAM in short-term primary cell cul¬ 
tures of n = 153 ER+ breast cancer biopsies. This approach allowed 
the authors to separate the various pharmacological phenotypes; 
approximately 7% of ER+ primary cultures were stimulated by TAM 
and almost 3% were inhibited by physiological concentrations of 
estradiol—notably close to our estimate of 9% for the sum of these 
two clinical phenotypes. 

It is important here to separate responses to physiological estro¬ 
gens from those produced by pharmacological estrogen therapy. 
High dose estrogen therapy was used prior to the advent of TAM. 
As with all endocrine therapies, approximately 30% of all breast can¬ 
cers (receptor status was not available when most of these studies 
were done) responded [31,32]. Side effects were unfavorable, prob¬ 
ably explaining the switch to TAM that also induces responses in 
approximately 30% of all breast cancers (when receptor status is not 
considered). It is also likely that the mechanisms of action of phar¬ 
macological and physiological dose estrogens differ. Over 15 years 
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ago, we were the first to show that pharmacological concentrations 
of both estradiol and TAM induce changes in the membrane fluid¬ 
ity of breast cancer cells and that this correlates with changes in 
cell growth [33], It is unlikely that membrane fluidity changes are 
major contributors to the action, either prosurvival or prodeath, of 
physiological estrogen exposures but they likely do contribute to 
the prodeath effects of pharmacological exposures. 

2. Cell fate in the context of endocrine responsiveness 

Therapeutic strategies for breast cancer generally aim to alter the 
balance between cell death and cell survival such that cancer cells 
(but ideally not normal cells) die. However, endocrine therapies 
consistently also induce a notable growth arrest in sensitive tumors. 
The relative importance of growth arrest and cell death remains 
unclear. To explore this issue, we will first discuss the forms of cell 
death and then compare the potential for cell death and cell growth 
arrest to contribute to endocrine responsiveness. 

Cell death pathways include signaling to apoptosis, autophagy, 
mitotic catastrophe, necrosis, and senescence. Late events in cell 
death are reasonably well defined at the molecular (such as PARP 
cleavage) and cellular levels (including DNA disintegration). How¬ 
ever, knowledge of the regulatory signaling upstream of these 
events, and how this signaling is integrated and processed, is now 
known to be incomplete. Mitochondrial function and integrity, reg¬ 
ulated in part by BCL2 family members, are central to several forms 
of cell death [34-36], 

2.1. Apoptosis 

Apoptosis is a programmed cell death defined by morphological 
criteria related to organized chromatin condensation and frag¬ 
mentation of the cell nucleus, accompanied by cleavage of DNA, 
formation of apoptotic bodies, cell shrinkage, and ruffling of the 
cell membrane [35,37,38]. Two major pathways are involved. The 
intrinsic (mitochondrial) pathway is regulated by the proapoptotic 
and antiapoptotic BCL2 family members; this pathway involves 
changes in mitochondrial membrane permeability (MMP), release 
of cytochrome c, exposure of phosphatidylserine on the outer leaflet 
of the plasma membrane, and the eventual loss of plasma mem¬ 
brane integrity [39]. The extrinsic (cell surface receptor) pathway 
is dependent upon extracellular signals including tissue necrosis 
factor-a (TNFa), Fas ligand, and TNF-related ligand TRAIL [37,38]. 
The intrinsic and extrinsic pathways activate caspases, the “exe¬ 
cutioners” of apoptosis, which cleave DNA and catabolize the 
cytoskeleton. Apoptosis is not a discrete process and occurs over 
time—early (4-18 h), middle (18-36 h), and late stages (>36h) are 
often described based largely on data from cell culture models. 
Changes in specific BCL2 family members (early events that can 
precede changes in MMP), changes in MMP, and the exposure 
of phosphatidylserine are generally interpreted as representing 
early-to-middle apoptosis. Cytoplasmic cytochrome c release from 
mitochondria, changes in propidium iodide staining, increased ter¬ 
minal transferase dUTP nick end labeling (TUNEL) and cleavage of 
the DNA repair enzyme PARP-1 are associated with late apoptosis 
or necrosis [35]. 

2.2. Autophagy 

Autophagy is a lysosomal pathway where cytoplasmic contents 
are degraded by double/multi-membrane vacuoles or autophago¬ 
somes, normally resulting in the removal of defective or damaged 
organelles, e.g., mitochondria. A better understanding of the regu¬ 
lation of autophagy has recently begun to emerge; key regulators 
are now known to include BCL2 family members [40,41 ] and their 


interacting proteins such as beclin-1 /ATG6 (BECNl) [42], BCL2 anti¬ 
apoptotic proteins can block autophagy by inhibiting BECNl [36], 
Since monoallelic loss of the BECNl locus is seen in >40% of breast 
cancers [43] (and in MCF-7 cells), modulating BCL2 may be an 
effective mechanism for regulating BECNl-activated autophagy. 
Autophagy can be identified by the absence of marginated nuclear 
chromatin, the presence of cytoplasmic vacuoles using trans¬ 
mission electron microscopy or monodansylcadaverine [44,45], 
cleavage of the LC3B protein [46,47], and regulation of the 
p62/SQSTMl protein [48], Early events in autophagy may be 
reversible; later events may (or appear to) share mechanisms with 
other cell death pathways. Eor example, cleavage of ATG5 by caplain 
[49] or upregulation of BID [41 ] can cause a switch from autophagy 
to apoptosis. 

Paradoxically, autophagy can act as a cell survival mechanism 
when extracellular nutrients or growth factors are limited, or as 
an alternative cell death pathway to apoptosis [50], Prosurvival 
outcomes likely reflect an adequate adjustment to stress, with 
energy/nutrients recovered from the organelles “digested” in the 
autophagosomes. Prodeath outcomes may arise when the self¬ 
digestion of autophagy leads to such a loss of organelles that the 
cell can no longer survive. In cancer cells, autophagy induction 
can accelerate cell death [51-55] or promote cell survival [56-58], 
independently or in response to treatment with cytotoxic agents. 

2.3. Mitotic catastrophe 

Faulty DNA structure checkpoints, or the spindle assembly 
checkpoint, are key components of this form of cell death [59,60]. 
Disruption of the normal segregation of many chromosomes results 
in rapid cell death [59[. When this cell death does not occur, the cell 
can divide asymmetrically and produce aneuploid daughter cells 
[61] that can become neoplastic [59,61], Thus, mitotic catastrophe 
is characterized by multinucleation. 

2.4. Necrosis 

Necrosis is a chaotic process marked by cellular edema, vac¬ 
uolization of the cytoplasm, breakdown of the plasma membrane, 
and an associated inflammatory response caused by the release of 
cell contents into the surroundings. Increased permeability to try¬ 
pan blue or other vital dyes, in the absence of organized chromatin 
condensation and DNA fragmentation, is characteristic of necrosis 
[44,62]. 

2.5. Senescence 

Senescent cells are characteristically enlarged, flattened with 
vacuoles and a large nucleus, be come permanently cell cycle 
arrested and unresponsive to mitogenic stimuli and express (3- 
galactosidase [45,63], Normally, as telomerase activity falls over 
time, successive telomere shortening limits proliferation and leads 
to “cellular senescence” or “mortality stage 1 (Ml)”. Inactivation 
of p53 can by bypass Ml growth arrest, producing critically short 
telomeres and massive cell death called “mortality stage 2 (M2)” or 
“crisis” [64], 

2.6. Endocrine-induced cell death in breast cancer 

Precisely how breast cancer cells die following estrogen with¬ 
drawal (or AI treatment) or AE treatment is unclear. Senescence 
may not be the dominant mechanism, since this process frequently 
involves DNA damage and p53 activation [38,45] but breast can¬ 
cer cells respond to AEs and to estrogen withdrawal even if they 
have mutated p53 [35,65], While apoptosis is clearly implicated 
[65-68], some of the apoptosis endpoints in prior studies may not 
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Fig. 1. Autophagy is enhanced upon FAS treatment in ER+ breast cancer cell lines. MCF-7 cells were treated with FAS (ICl 182,780), the endoplasmic reticulum stress and 
autophagy inducer tunicamycin (TUN), or ethanol control (vehicle) prior to staining with monodansylcadaverine (MDC). Increased MDC staining indicates that autophagy 
has been induced. 


distinguish among earlier events more closely implicated with sig¬ 
naling initiated through autophagy. Autophagy has been implicated 
in response to endocrine therapy [69-71 ] and we also see the induc¬ 
tion of significant autophagy associated with endocrine therapies. 

Fig. 1 shows our ability to detect significant changes in the num¬ 
ber of autophagosomes as measured by an increase in the presence 
of cytoplasmic vacuoles identified by monodansylcadaverine stain¬ 
ing [44,45] (Fig. 1), increased cleavage of the LC3 protein [46,47], 
and reduced expression of p62/SQSTlVIl [48,72-74] (Fig. 2). We have 
previously shown, as have others, that AE treatment and estrogen 
withdrawal are also accompanied by increases in the level of apop¬ 
tosis and growth arrest in sensitive cells. Indeed, when restoring 
AE sensitivity in resistant cells we frequently see that sensitiv¬ 
ity is reflected in the restoration of an ability of the antiestrogen 
(or estrogen withdrawal) to both increase apoptosis and reduce 
proliferation [75,76]. As shown in Figs. 1 and 2, and consistent 
with other reports [69-71], prodeath autophagy also is associ¬ 
ated with the growth inhibitory effects of endocrine therapies in 
breast cancer cells. Thus in experimental models, cells respond¬ 
ing to endocrine therapies concurrently experience an increase in 
cell growth arrest accompanied by both apoptosis and a prodeath 
autophagy. 

2.7. Proliferation, cell death, and endocrine responsiveness 

One of the most consistent observations in both experimental 
models in vitro and in vivo and in clinical specimens is the abil¬ 
ity of endocrine therapies to induce a profound growth arrest in 
sensitive breast cancer cells. However, the relative importance of 
increased cell death compared with reduced proliferation is not 
entirely clear. In most endocrine sensitive experimental models, 
growth arrest and cell death concurrently occur and both clearly 
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Fig. 2. Autophagy is enhanced upon FAS treatment in FR+ breast cancer cell 
lines. MCF7/LCC1 cells were treated with FAS, TUN, or vehicle prior to lysis 
and immunoblotting using standard procedures. Increased LC3B11 (asterisk) and 
decreased p62/SQSTMl expression both indicate that autophagy has been induced. 


contribute to the ability of endocrine therapies to affect changes in 
anchorage-dependent cell number, anchorage-independent colony 
formation, or tumorigenesis over time [27,77,78], Less clear is their 
relative contribution in driving clinical responses to endocrine ther¬ 
apies. Growth arrest appears to be readily detected in breast tumors 
responding to endocrine therapy. Less clear is the ability to detect 
robust changes in apoptosis. Some investigators do [79], and some 
do not [66], see an association of apoptosis or a molecular maker(s) 
of apoptosis with clinical response. The latter is in marked con¬ 
trast to studies in experimental models. For some studies, response 
is related to molecular markers of apoptosis such as BCL2 [79] or 
the FasLiFas ratio [80[. Notably, expression of the anti-apoptotic 
molecule BCL2 is reduced in responsive breast tumors by 3 months 
of TAM treatment [79], while in breast tumors that remain after 
TAM therapy BCL2 expression is elevated [81 [. However, as noted 
above, BCL2 can affect both an apoptotic and autophagic cell death 
and its measurement alone is likely a poor predictor of any specific 
cell death mechanism. 

If cell death does not occur in clinical breast cancer this obser¬ 
vation clearly requires explanation. Several possible explanations 
exist—in the absence of compelling experimental/clinical data sup¬ 
porting or eliminating these explanations we make no assessment 
at this time on their relative merits. Firstly, it should be noted that 
measures of apoptosis are usually the primary endpoints for assess¬ 
ing rates of cell death. Our previously published results, the data in 
Figs. 1 and 2, and the work of others [69-71] show that estrogen 
withdrawal or antiestrogens increase both the rates of apoptosis 
and autophagy in breast cancer models responding to treatment. 
We interpret this as a prodeath autophagy in sensitive cells, con¬ 
sistent with other reports [69-71]. It remains unclear whether 
autophagy or apoptosis dominates as the cell death mechanism or 
whether this varies among different breast cancer cells. Measuring 
apoptosis may be the wrong measure of cell death in tumors, or 
it may be an inadequate measure if it represents only some pro¬ 
portion of cells that die through this process. Secondly, apoptosis is 
often considered to comprise early, mid and late stages, and an irre¬ 
versible commitment to cell death may not be robustly associated 
with endpoints other than those definitively reflecting late stage 
apoptosis. A measure of apoptosis that is not robustly associated 
with ultimate cell death could provide an incomplete assessment 
of the rate or extent of cell death. Thirdly, if the timing of apopto¬ 
sis is as fast in patient tumors as it is in vitro, measurements taken 
before 24-36 h and/or after 36-48 h could miss many of the key 
events. The most sensitive cells would have been through apopto¬ 
sis and be already dead and gone, and the rate of apoptosis could 
have returned to the basal level. Fourthly, duration of the apoptotic 
response may differ between basal apoptosis and drug-induced 
apoptosis. If drug-induced apoptosis leads to a more rapid death, 
the number of cells processing though apoptosis could increase 
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without any detectable change across time in the apparent rate of 
apoptosis. 

Finally, a reduction in cell proliferation alone could be suffi¬ 
cient to account for some shrinkage of tumor size, as the rate of 
cell replacement might no longer be sufficient to account for cell 
loss from either a basal rate of cell death and/or loss to migration 
and metastasis. However, unless almost all growth arrested cells 
also undergo some form of cell death, it is unclear why growth 
arrest alone should lead to large and relatively rapid reductions 
in tumor size (over several weeks compared with often many years 
of presumably much longer growth prior to clinical detection and 
treatment). Growth arrest alone may be sufficient to account for 
good responses in some tumors, particularly where there is a high 
basal rate of cell death. However, it is not immediately clear how 
this applies to tumors with an inherently low rate of proliferation, 
whether because the growth fraction is large but cycling slowly or 
the growth fraction is small but proliferating rapidly. This is an area 
where mathematical modeling could be particularly useful, since it 
could compare the effect sizes needed for relative changes in pro¬ 
liferation and cell death to affect predicted overall tumor size over 
time. 

While there is currently no definitive understanding of the pri¬ 
mary cell death mechanisms in either experimental models or in 
breast tumors in women, or of the relative importance of endocrine 
therapy-induced changes in proliferation compared with cell death, 
there are potentially important implications for the underlying biol¬ 
ogy of the cancer cells. If the primary driver of response as seen 
in tumor shrinkage is a reduction in proliferation, this will leave 
many cells alive and still metabolically active. Surviving cells have 
the ability to adapt to the endocrine-induced stress and eventually 
overcome the proliferative blockade and grow—they will become 
resistant. This process seems unlikely to occur in many of those 
women who receive the clear long term benefit of a significant 
reduction in the risk of death [8,9]. 

Whether it is the growth arrested but surviving cells that even¬ 
tually become resistant is unknown but it is certainly an intuitively 
satisfying hypothesis. Moreover, this hypothesis is supported by 
the ability to take sensitive cells in culture, expose them for pro¬ 
longed periods to either estrogen withdrawal or AE treatment, and 
eventually induce an acquired resistant phenotype [27,28,77,82], 
This process is accompanied by a profound and prolonged period 
of growth arrest prior to the emergence of resistant cells, a pat¬ 
tern consistent with the clinical progress of the disease in tumors 
that initially respond to therapy but that eventually recur—often a 
decade or more after the initiation of TAM treatment. 


3. Molecular signaling and resistance 

The precise mechanisms of resistance to an AE and/or an Al 
remain unclear, reflecting an incomplete understanding of the sig¬ 
naling affecting cell proliferation, survival, and death and their 
hormonal regulation in breast cancer. We have previously reviewed 
the mechanisms of resistance to AEs and to estrogen depriva¬ 
tion elsewhere in some detail [10,23,29], so we focus here on the 
molecular signaling aspects of resistance and how these may be 
integrated and explored using emerging technologies. We will focus 
primarily on signaling to cell death—signaling to regulate prolifer¬ 
ation in the context of endocrine responsiveness will be the subject 
of a separate review. 

The primary technologies that have matured sufficiently to 
enable global approaches to network modeling include gene 
expression microarrays, ChIP-on-chip, SNP chips, high-throughput 
DNA sequencing, and array CGH. Each of these technologies has 
reached a high level of maturity, and each is characterized by 
the generation of very high dimensional data on each sample 


whether the read-out be genomic or transcriptomic data; this also is 
true of the emerging high-throughput proteomic technologies. The 
remarkable volume of data, and the diversity of biological infor¬ 
mation that informs the interpretation of these data, has begun to 
transform the fields of biostatistics, computer science, and bioin¬ 
formatics. However, the properties of these datasets are often not 
fully understood nor are the challenges these properties provide for 
data analysis and network modeling. Readers interested in explor¬ 
ing some of these challenges can read recent reviews [83,84]. Here 
we will address briefly several approaches to the use of these data 
for network modeling. 

3.3. A network signaling hypothesis of endocrine responsiveness 

Estrogen-independence and AE resistance are complex pheno¬ 
types and both genomic and non-genomic activities are implicated 
[10,33,85]. We consider it unlikely that endocrine resistance in 
ER+ tumors is driven by a single gene/signaling pathway. Unlike 
many previous single gene/pathway studies, our central hypothe¬ 
sis invokes a gene network that confers diversity and redundancy 
in signaling [10,86]. The cell death/survival network incorporates 
specific signaling as affected by estrogen and AE modification of 
ERa function. Thus, AEs regulate this network differently than other 
agents such as cytotoxic drugs. 

Signaling leads first to the reversible initiation of several cell 
death/survival signaling pathways within the network. The irre¬ 
versible machinery of cell destruction is activated at some later 
point. This machinery may induce common outcomes - such as 
activation of effector caspases and DNA/plasma membrane dis¬ 
integration - independent of the early specific initiating signals. 
Hence, we envision multiple concurrent signals processing through 
this network, some prosurvival and some prodeath, with cell fate 
reflecting the dominant signaling. In endocrine resistant cells, 
endocrine regulation and/or function of components of this net¬ 
work are changed and prodeath signals are either no longer induced 
or dominant. 

This cell fate signaling network hypothesis is intuitively logical 
and certainly testable. Evidence that cells induce prosurvival signal¬ 
ing in an attempt to circumvent stressors implies that some cells are 
successful and ultimately survive whereas others are unsuccessful 
and die. Thus, the balance between prosurvival and prodeath sig¬ 
naling is likely the final arbiter of cells fate [83]. While this remains 
an area of active investigation, we first discuss the basic principles 
of network modeling and then provide an example of a seed-gene 
network of endocrine-regulated signaling in endocrine responsive¬ 
ness. 

3.2. Basic concepts of gene networks 

Cellular signaling occurs more in the context of interactive net¬ 
works than through linear pathways [83]. The basic topology of a 
network is defined by nodes (genes/proteins) and their intercon¬ 
nections (edges). Interconnections are multi-faceted and include 
one-to-one, one-to-many, or many-to-one relationships, and feed¬ 
forward or feed-back loops. The dynamic activity of a network is 
constrained by the various forms of interactions, and the network 
behaves only in certain ways and controlled manners in response 
to changing cellular conditions or external stimuli ]87]. While 
often built solely from gene expression microarray data, these data 
are high dimensional and contain spurious correlations that can 
confound simple solutions for network building ]83,84]. Relevant 
events also occur in the genome and proteome, some of which can 
affect the transcriptome. For example, a transcription factor (TF) 
may be activated by phosphorylation and bind to responsive ele¬ 
ments in the genome but the regulation of its downstream targets 
is seen in the transcriptome ]83]. An example of this relationship is 
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Fig. 3. Illustration of the complex and challenging nature of pathway analysis. Genes 
identified as being differentially expressed in resistant MCF7/LCC9 cells by SAGE 
and gene expression microarray were analyzed by Pathway Architect (Stratagene) 
to identify relationships in silico. 

the ligand-independent activation of ERa following its phosphory¬ 
lation on SERllS by MAPK [88], 

Simplistically, there are two basic approaches to network model¬ 
ing of high dimensional data; top-down and bottom-up. The former 
is probably the most widely used approach as several accessible 
commercial software packages are available that make this an easy 
task to perform without the need for training in biostatistics or 
bioinformatics. These packages often apply various implementa¬ 
tions of gene ontologic and semantic search algorithms that identify 
cellular functions and pathways to which individual nodes are 
assigned; these data are then graphically represented. 

The solutions produced by several popular top-down algorithms 
are often characterized by representations of tens-to-hundreds of 
nodes linked by hundreds-to-thousands of edges, making inter¬ 
pretation challenging (Fig. 3). Whether the algorithms address the 
confounding properties of high dimensional spaces, such as the 
curse of dimensionality or the confound of multimodality, or incor¬ 
porate the critical aspects of cellular context and alleviate the trap 
of self-fulfilling prophesy, is not clear [83], Among the additional 
challenges are the incompleteness of relevant biological knowledge 
and the annotation error rate in the source databases searched by 
these algorithms [83]. Nonetheless, these approaches can be useful 
when carefully applied and their limitations fully understood, and 
when experts from both the biological and mathematics domains 
combine expertise to assess the validity of the solutions. Currently, 
such approaches probably have most to offer in the area of hypoth¬ 
esis generation, rather than in the construction of truly biologically 
meaningful signal transduction networks. 

3.3. The “seed-gene" approach to network modeling 

The bottom-up approach is generally referred to as the “seed- 
gene approach” to network modeling [89]. This approach requires 
the extraction of a small number of seed genes from within the pri¬ 
mary data; these genes are then used to grow the network in several 
ways. We will not address all the various approaches in this review 
but provide a few brief examples. Various modeling methods can 
be applied to find and link adjacent nodes, growing the network de 
novo. Local subnetworks can be identified and overlaid or linked to 


the initial seed genes. A simple approach is the incorporation of a 
canonical pathway (which may be a subnetwork in what would be 
a final and much broader network) when it is known to be relevant 
in the cellular context under study and where incorporating the 
nodes and edges of the canonical pathway members is consistent 
with statistical properties of the growing model topology. 

Knowledge of how a gene (node) affects the expression/function 
of another node provides directional connectivity information that 
can be applied to the interacting nodes. Transcription networks can 
be grown (or transcriptional edges between nodes in a network 
that incorporates other biological knowledge) by linking TFs to their 
downstream targets. These targets can be predicted using specific 
algorithms [90-93]; where possible it is preferable to incorporate 
functional data such as that obtained from ChIP-on-chip arrays [91 ]. 
Thus, interacting nodes can be identified along with the direction¬ 
ality of their edges as the seed-gene network is grown. 

The most labor intensive approach is to derive experimentally 
nodes and edges, growing the network using definitive laboratory- 
derived knowledge. Where additional high-throughput data are 
already available, such as ChIP-on-chip, this is preferable. Currently, 
functional data is probably more often obtained one gene at a time, 
using standard molecular methods such as gene knock-down and 
overexpression. This laborious approach is becoming supplanted 
with the emerging functional genomic methods such as siRNA, 
ribozyme, or antisense libraries that can test experimentally the 
contribution of hundreds to thousands of genes. These methods 
enable investigators to extract concurrently nodes that experimen¬ 
tally generate biologically appropriate changes in the phenotype 
under investigation. 

Once seeds and their edges are identified, and functional bio¬ 
logical metadata obtained, interactive models can be grown using 
neural network and other machine learning tools. Several models 
have been proposed to reveal the behaviors of regulatory networks 
from gene expression data [22,23] including Boolean networks 
[24-26], Bayesian networks [27-30], linear additive regulation 
models [31,32], state-space models (SSMs) [33,34], and recurrent 
neural networks (RNN) [35,36]. However, these methods use only 
mRNA expression data to infer networks. 

Integrated approaches have been recently proposed to learn 
transcriptional regulation from various data sources [27,30,37-43]. 
An iterative search on mRNA expression and ChIP-on-chip data [37], 
or the incorporation of expression profiles, ChIP-on-chip, and motif 
data [41 ] have each been used in yeast to discover transcriptional 
networks. Several linear models or matrix decomposition meth¬ 
ods have also been proposed [43-46]. Network component analysis 
(NCA) is a notably powerful approach [45] but NCA and these other 
methods cannot easily infer regulatory networks in biological sys¬ 
tems more complex than yeast. 

Other limitations exist in network modeling. Complete bio¬ 
logical knowledge for topology estimation (node-node edges and 
directionality), such as high-throughput ChlP-on-chip data or func¬ 
tional data from laboratory experiments, are often not (or only 
partially) available for human cells. When heterogeneous data 
sources are integrated for computational inference, the consistency 
of different data sources is often inadequate or unknown. Topologi¬ 
cal knowledge also comes from biological experiments, which often 
contains false positives/negatives that can lead to incorrect network 
inference. 


4. Seed-gene model for cell signaling and the regulation of 
cell fate 

While we continue to develop new methods for network mod¬ 
eling, we have yet to report our modeling approaches to our own 
expanding data sets. Hence, we will here describe our initial studies 
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on the use of seed genes and experimental data to construct a sim¬ 
ple wiring-diagram of our initial seed-gene network. The inability 
to induce signaling to irreversible cell death is a central component 
of drug resistance [94], Thus, we propose that cells possess a com¬ 
mon cell death/survival regulatory decision network of integrated 
and/or interacting pathways (see above). 

Prior to building network models, it is necessary to extract ini¬ 
tial nodes (seed genes) from which a network can be built [89], 
Since ER is a TF and regulates other functionally relevant TFs that 
influence endocrine responsiveness and cell fate, selecting a small 
number of TFs as seed genes is reasonable for network modeling. 
The full list of relevant ER-regulated TFs that may affect cell fate 
is unknown. Nonetheless, our published data support the central 
hypothesis that that IRFl [65,95-97], XBPl [76,95] and NFkB(RELA) 
[75,95] are key regulatory nodes or control key modules in this net¬ 
work. Moreover, our experimental data in endocrine sensitive and 
resistant breast human cancer cells now allow us to map their edges 
and directionality, in an appropriate cellular context, with some 
confidence. 

4.1. X-box binding protein-] (XBPl) and the unfolded protein 
response (UPR) 

UPR is a central component of the endoplasmic stress response 
[98], an adaptive signaling pathway that allows cells to survive the 
accumulation of unfolded proteins in the endoplasmic reticulum 
lumen [99[. Initially a compensatory mechanism allowing cells to 
recover normal endoplasmic reticulum function, a prolonged UPR 
may induce cell death. UPR, which can be induced by cellular stres¬ 
sors sucb as hypoxia, is activated by each of three molecular sensors: 
IREla, ATF6, PERK [100]. XBPl’s unconventional splicing (occurs in 
the cytosol) by IREla is an obligate component in both IREla- and 
ATF6-induced UPR [100,101]. The UPR (initiated by XBPl splicing 
by IREla) can activate autophagy [102]. Whether this is a pro¬ 
survival or prodeath form of autophagy is unknown, since UPR 
activation also can induce both prodeath and prosurvival outcomes 
[103]. 

XBPl is a transcription factor that belongs to the basic 
region/leucine zipper (bZlP) family [104,105]. The unspliced form, 
XBPl (U), has a molecular weight of ~33 kDa and acts as a dominant 
negative of spliced XBPl [106,107]. The spliced form, XBPl(S), has 
a molecular weight of ~54kDa: splicing removes a 26 bp intron 
and creates a translational frame-shift. Regulation of transcrip¬ 
tion by XBPl(S) is a consequence of its homodimers activating 
specific cAMP response elements (CREs) with a conserved ACGT 
core sequence GATGACGTG(T/G) NNN(A/T)T—sometimes called the 
UPR element [103,104,108]. XBPl(S), which is implicated in affect¬ 
ing plasma cell differentiation [109], is essential for fetal survival, 
neurological development, bone growth, immune system activa¬ 
tion, and liver development [110,111 ]. XBPl is also rapidly induced 
in response to estrogen-stimulation [112,113]. Consistent with the 
work of others [108], we have shown that XBPl(S) can bind to and 
activate ERa in a ligand-independent manner (Fig. 4). 

We have recently shown that XBPl (S) confers E2-independence 
(effectively an Al resistant phenotype) and AE crossresistance (TAM 
and FAS crossresistance) in both MCF-7 and T47D human breast 
cancer cells [76]. This activity appears to be driven primarily by 
XBPl(S), as introduction of the full-length XBPl cDNA in either 
MCF-7 orT47D cells generates predominately the XBPl(S) protein. 
This observation suggests that the basal activity of IREla is already 
adequate and that XBPl(S) is the rate limiting protein. XBPl is the 
only known substrate for the IREla endonuclease and only IREla 
can splice mammalian XBPl. Since XBPl splicing is thought to func¬ 
tion primarily within the UPR, breast cancer cells may be primed to 
respond to multiple stressors by activating a prosurvival induction 
of UPR. 
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Fig. 4. Physical association of XBPl and ERa is accompanied by robust ERE-driven 
transcriptional activity in MCF7/XBP1 cells. (A) MCE-7 cells stably expressing XBPl 
cDNA or the empty vector control (c) were treated with FAS or ethanol control (Ctrl.) 
vehicle prior to lysis and immunoblotting (lanes 1 and 2) or co-immunoprecipitation 
of XBPl and ERa (lanes 3 and 4) using standard procedures. (B) MCF7/c and 
MCF7/XBP1 cells were transiently co-transfected with plasmids encoding 3xERE- 
luciferase and phRLSV40-Renilla for 24 h prior to lysis and promoter-reporter 
luciferase assay by standard methods. Data are presented as mean relative ERE- 
luciferase activity ±SE for a representative experiment performed in triplicate, 
>< 0 . 001 . 


4.2. Interferon regulatory factor-] (IRFl) 

RFLP linkage analysis assigned the IRFl gene to 5q23-31: more 
definitive studies identified the locus as 5q31.1 [114]. IRFl was ini¬ 
tially identified because of its transcriptional activation of type 
I interferon (IFN) genes. We first showed the ability of interfer¬ 
ons to sensitize breast cancer cells to TAM over 20 years ago 

[115] . More recently, IRFl was implicated in T-cell development 

[116] , and it is now known also to coordinate expression of the 
immunoproteasome [117], to regulate human telomerase activ¬ 
ity [118,119], and to regulate key aspects of DNA damage repair 
[120,121]. Loss of IRFl increases tumorigenicity in mouse mod¬ 
els driven by ras or loss of p53 [122]. These activities may reflect 
IRFl’s ability to signal to apoptosis [123], which can occur in a 
p53-dependent or -independent manner [ 120,124], with or without 
induction of p21“P’ [124] or p27''‘P’ [125], and through caspase-1 
[120], caspase-3 [96], caspase-7 [96,126], caspase-8 [96,127], and/or 
FasL[128]. 

Following our initial observations of IRFl’s likely role in breast 
cancer [129-131] and antiestrogen resistance [129], we confirmed 
its functional involvement using a dominant negative approach 
(dnlRFl) [65]. IRFl and dnlRFl induce opposing effects on pro¬ 
liferation in vitro and tumorigenesis in vivo through regulation 
of caspases-3/7 and caspase-8 activities [96]. These observations 
are consistent with the effects of inoculating an adenoviral vec- 
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tor containing IRFl directly into mouse mammary tumors [132]. 
While p53-dependent apoptosis occurs in the breast [133], T47D 
cells express mutant p53 and our data show that Intact p53 is not 
required for the proapoptotic actions of IRFl [65,96]. In AE sen¬ 
sitive breast cancer cells, inhibition of AE-induced IRFl activity by 
dnlRFl Is accompanied by reduced proapoptotic activity [65], These 
observations on IRFl and AE responsiveness have been confirmed 
and extended by others in both normal [134] and other neoplastic 
breast cell culture models [135,136], IRFl, which can signal through 
both p53-dependent and -independent mechanisms [120,124], pro¬ 
vides a new and potentially important signaling molecule for 
Integrating and regulating breast cancer cell survival in response to 
AEs. 

4.3. Nuclear factor kappa B (NF/cB) 

The NFkB p50/p65 heterodimer complex comprises two homol¬ 
ogous proteins: the p50 product of its pl05 precursor (NFkBI; 
chromosome 4q24) and the p65 (RELA; llql3). NFkB is main¬ 
tained in the cytosol in an inactive state, bound with members 
of the IkB family that inhibit nuclear transport or block NFkB’s 
nuclear translocation signal [137], Activation usually proceeds by 
the IKK kinase complex phosphorylatlng IkB, resulting in IkB 
ubiquitination and degradation [138], NFkB (RELA/NFkBI ) is impli¬ 
cated in several critical cellular functions [139], Reflecting its 
regulation by both estrogen and growth factors ] 140,141] that 
are involved in endocrine resistance ]10,142], normal mammary 
gland development Is dependent upon NFkB ]143]. Increased 
NFkB activity arises during neoplastic transformation in the 
rat [144] and mouse mammary gland [145]. Upregulation of 
NFkB Is associated with E2-independence ]140,143]. The predom¬ 
inant NFkB form in breast cancer cell lines is RELA/NFkBI; the 
p52 family member also Is expressed In some breast cancers 
[146]. 

We have shown that NFkB can confer estrogen-independence 
and AE crossresistance ]75,95,147]. Estrogen-independent growth 
in vitro and in vivo is supported by increases in both NFkB DNA bind¬ 
ing activity and expression of BCL3 [147]. This study highlights the 
functional Implications of NFkB In AI resistance. Expression of iKBa 
(NFkB repressor) in estrogen-independent LCCl cells (LCCl cells 
are derived from MCF-7 and are estrogen-independent but sensi¬ 
tive to AEs [148]), which have increased NFkB activation relative 
to estrogen-dependent MCF-7 cells, eliminates their estrogen- 
independence in vivo. 

LCC9 cells (TAM and FAS crossresistant variant of LCCl [28]) 
exhibit a further Increase In NFkB expression and activation rel¬ 
ative to LCCl cells, apparently driven by increased expression 
of NEMO (IKK'y) ]75]. These observations imply that the level 
of activity in LCCl cells is adequate for estrogen-independence 
but not AE resistance. Increased activation of NFkB ]95] and loss 
of Its antiestrogenic regulation in LCC9 cells ]75] suggest that 
these cells might be dependent upon NFkB for survival/growth. 
Thus, we compared the growth response of LCCl and LCC9 cells 
to vehicle or parthenolide (300 and 600 nM), a small molecule 
Inhibitor of NFkB [149], Parthenolide produces a dose-dependent 
inhibition of MCF7/LCC9 cells with an apparent IC 50 of approxi¬ 
mately 600 nM (p<0.01 at both 300 and 600 nM parthenolide). 
In marked contrast, parthenolide does not affect growth of LCCl 
cells at either of these concentrations ]75]. We next asked if 
parthenolide can re-sensitize LCC9 cells to FAS-mediated apop¬ 
tosis. FAS and parthenolide synergize to induce LCC9 cell death 
]75]. Since FAS alone Is inactive ]28], this synergism reflects 
at least a partial reversal of the FAS resistance component of 
the LCC9 cell phenotype and implicates NFkB as a key deter¬ 
minant ]75]. Thus, AE crossresistant cells exhibit a greater 
reliance upon NFkB signaling for proliferation, and inhibition 


of NFkB restores their sensitivity to apoptosis induced by FAS 
[95]. 


4.4. Expression ofER, PGR, XBPl, NFkB and IRFl in breast tumors 

Using gene expression microarrays, we previously compared the 
global structures of the transcrlptomes of three ER+ human breast 
cancer cell lines (MCF-7, T47D, ZR75-1) and 13 human breast tumors 
(11 ER+; 2 ER-) and showed these to be notably similar to ER+ 
breast tumors from patients ]150]. The striking similarities between 
cell lines and tumors are supported by a report that the estrogen- 
regulated genes In these cell lines are similarly regulated in breast 
tumors [151 ]. These data show that ER+ breast cancer cell lines and 
ER+ breast tumors in women share global similarities in the struc¬ 
tures of their respective transcriptomes [150], and that these cell 
lines are appropriate models in which to identify clinically relevant 
endocrine-regulated molecular events ]150,151]. Nonetheless, It Is 
necessary to show that the seed genes we have selected are likely 
to be relevant to the biology of ER+ breast tumors. 

To begin to explore the possible clinical relevance of these func¬ 
tional studies, we first asked if we could detect XBPl, NFkB, and 
IRFl In breast tumors. We then asked whether any of these proteins 
were coexpressed in patterns consistent with the experimental data 
from cell lines. Using a series of breast cancer tissue arrays com¬ 
prising 480 cores from 54 breast carcinomas (mostly ER+ tumors), 
we applied Immunohistochemistry to explore the expression of the 
seed genes ]152]. Pairwise correlation analyses cannot account for 
the possibility that unknown associations among proteins may con¬ 
found each other, so we applied a novel use of partial correlation 
coefficient analysis. Partial correlation analysis allows an estimate 
of the correlation between two variables while controlling for a 
third, fourth and/or fifth and is particularly useful in the analysis of 
small signaling networks of 3-5 variables [153], 

We confirmed the well established coexpression of ERa and PgR, 
implying that the samples are representative of most ER+ breast 
cancers. XBPl, NFkB, and IRFl are each found in a high propor¬ 
tion of breast tumors [152], Total XBPl was measured, as XBPl(S) 
antibodies were not then available. XBPl staining is variable but 
detectable in 79% of breast tumors. A very recent study has reported 
a significant association between XBPl (S) mRNA and poor response 
to endocrine therapy ] 154]— entirely consistent with our studies in 
breast cancer cell lines ]76]. 57% of the tumors express detectable 
RELA in their neoplastic cells, similar to a prior study of n = 17 breast 
tumors [146], 

Expression of several of the proteins is correlated in breast 
tumors. IRFl correlates with ER and PGR, and also with RELA 
and XBPl. While, these correlations depend on the subcellular 
localization of IRFl and some are direct and others inverse corre¬ 
lations, they are fully consistent with the interpretation that these 
expression patterns reflect functionally relevant signaling links. For 
example, we might predict that IRFl sequestered in the cytosol, 
unlike that in the nucleus, cannot act as a proapoptotic TF (the 
full coexpression patterns are described detail in the report by Zhu 
et al. [152]). We also find coexpression of XBPl and RELA, consis¬ 
tent with the observation that XBPl may be downstream of NFkB 
]109]. When each of the significant correlations is examined in the 
partial correlation coefficient models, the IRFl, NFkB, and XBP cor¬ 
relations remain [152], These data are consistent with these three 
reflecting some component of a larger signaling network active in 
some ER+ breast cancers and further support their selection as seed 
genes from which to grow this network and understand its topology 
and function. Moreover, the functional data from our experimen¬ 
tal models implies that this network links signaling and function 
through two key subcellular components—mitochondria and the 
endoplasmic reticulum. 
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Fig. 5. Endocrine resistance seed-gene network. Simple representation of a seed-gene network of XBPl, NFkB and IRFl based on functional data obtained from an appropriate 
cellular context (resistant MCF7/LCC9 cells). 


4.5. Simple representation of a seed-gene network of XBPl, NFkB 
and IRFl based on functional data obtained from an appropriate 
cellular context 

The experimental data supporting the wiring-diagram repre¬ 
sentation of the network model shown in Fig. 5 are discussed the 
preceding sections. Here we discuss how the signals may flow 
through this network. The three primary seed genes of IRFl, XBPl, 
and NFkB are evident as previously proposed [95], IRFl expres¬ 
sion is repressed in resistant cells [95] but induced by antiestrogens 
in sensitive cells [65]. A dominant negative IRFl confers an antie¬ 
strogen resistant phenotype, implying that IRFl-driven prodeath 
signaling is key to the regulation of cell fate [65], 

In addition to changes in the expression of IRFl, the upregula- 
tion of NPM expression [95,155] could also affect IRFl action. Both 
NPM and IRFl are estrogen-regulated genes in MCF-7 cells, IRFl 
expression being suppressed, whereas NPM is induced [129,155]. 
Since NPM inhibits the transcription regulatory activities of IRFl 
[156], the increase in NPM expression could bind remaining IRFl 
and inhibit its ability to initiate an apoptotic caspase cascade. We 
also cannot exclude the possibility that NPM has activities inde¬ 
pendent of blocking IRFl, since NPM overexpression is sufficient 
to transform NIH 3T3 cells in a standard oncogenesis assay [156]. 
Increased levels of serum autoantibodies to NPM predict recurrence 
on TAM 6 -months prior to clinical detection [157]. 

IRFl and NFkB are known to form heterodimers and to regulate 
directly gene expression [158,159] including that of the inducible 
nitric oxide synthase promoter [158]. Since we do not know if it is 
primarily the gene regulatory effects of these heterodimers, or if 
their subcellular location is key (they act by preferentially seques¬ 
tering one or the other so that transcriptional regulation does not 
occur), this is shown as a dotted line. We would predict, based on 
the inverse expression between NFkB and IRFl in LCC9 cells [95] 
and in some breast cancers [152], that either the prodeath effects 
of any remaining IRFl are being sequestered by NFkB in resistant 
cells and/or that the overexpression and activation of NFkB leads to 
a dominance of its prosurvival activities. The increased sensitivity 
of resistant cells to parthenolide is consistent with the functional 
relevance of at least the latter signaling outcome [75[. 

We have previously shown that the upregulation of NFkB in 
antiestrogen resistant cells [95] is likely driven in part by increased 
NEMO/IKK 7 activity [75[. The prosurvival activities of NFkB are 
well documented [160]. Precisely how NFkB regulates cell sur¬ 
vival remains to be fully established but activation of prosurvival 


members of the BCL2 gene family are involved in both acquired 
estrogen-independence [147] and antiestrogen resistance [75,76]. 
While NFkB is predicted to induce transcription of XBPl [109], we 
have yet to report this direct regulation in breast cancer cells (stud¬ 
ies are in progress). Whether or not this occurs, XBPl is clearly 
upregulated in resistant cells [95] and this activity is sufficient 
to confer both estrogen-independence and antiestrogen resistance 
[76]. More recently, increased XBPl mRNA expression has been 
show to predict for a poor response to TAM in breast cancer patients 
[154]. 

The central role of XBPl within the UPR clearly implicates 
UPR activation in responsiveness to both estrogen-withdrawal and 
antiestrogen treatment [76[. UPR also is known to induce autophagy 
[ 102 ], although whether this is a prosurvival or prodeath autophagy 
remains unclear in the context of determining endocrine respon¬ 
siveness. Autophagy is regulated, at least in part, by the action of 
BECNl.BECNl activity is regulated by BCL2, which binds BECNl and 
can block BECNl-mediated autophagy [36]. 

The regulation of BCL2 family members (BCL2, BCL3, and prob¬ 
ably others) whether by IREl, NFkB, and/or XBPl, can affect both 
autophagy and the intrinsic apoptosis pathway. The intersection 
of their signaling at BCL2 family members, as shown in Fig. 5, 
is one location within the broader network where the balance 
between prodeath and prosurvival signaling, and whether prodeath 
is autophagic or apoptotic, is determined. This intersection also 
links signaling through the UPR and endoplasmic reticulum to the 
mitochondria with the cell fate decision mechanisms—at least in 
the context of determining cell fate in the context of endocrine 
responsiveness in breast cancer. The signaling depicted in Fig. 5 
represents only a small component of this broader network. Nev¬ 
ertheless, this initial wiring-diagram is consistent with a body of 
functional data in experimental models and it provides sufficient 
seed genes, their edges, and the directionality of these edges, to 
begin a more detailed exploration of this central network. Under¬ 
standing this network’s topology and function will lead to better 
candidates for drug discovery and to better algorithms to predict 
how individual tumors will respond to specific endocrine therapies. 
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8 Abstract The interferon regulatory factor- 1 (IRFl) gene, 

9 localized on chromosome 5q31.1, is mutated or rearranged 

10 in several cancers including some hematopoietic and gastric 

11 cancers. However, whether loss of IRFl occurs in sporadic 

12 breast cancer is unknown. Loss of 5q 12-31 is reported in 

13 11% of sporadic breast cancers, and high-resolution array- 

14 CGH studies have shown loss at 5q31.1 in 50% of breast 

15 cancers with a mutated BRCAl gene. Functionally, over- 

16 expression of IRFl reduces, and a dominant negative IRFl 

17 construct increases, tumorigenesis of human breast cancer 

18 xenografts. Taken together, these observations indicate that 

19 the IRFl gene may play a potentially important role as a 

20 breast cancer tumor suppressor gene. In this study, we 

21 investigated allelic loss of the IRFl gene in breast tumor 

22 specimens from 52 women with invasive breast cancer 
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locus was detected in 32% of these informative cases (12/ 25 

37). There was a significant association between IRFl loss 26 
and both older age (P = 0.0167) and earlier stage (Stages 1 27 

and 2) (P — 0.0165). To assess the association of IRFl 28 
mRNA expression with clinical outcomes in breast cancer, 29 
we studied data from two published gene expression 30 
microarray datasets. In breast cancer patients, low IRFl 31 


mRNA expression is strongly correlated with both risk of 32 
recurrence (OR = 3.00; P — 0.003; n — 273 cases) and 33 
risk of death (OR = 4.18; P = 0.004; n = 191 cases). Our 34 
findings strongly imply a tumor suppressor role for the IRFl 35 


gene in breast cancer. 36 

37 

Keywords Interferon regulatory factor-1 • IRFl ■ 38 

Breast cancer • Disease survival • Tumor suppressor gene 39 

Introduction 40 


The interferon regulatory factor-1 {IRFl) gene mediates 41 
interferon and other cytokine effects and exhibits antitumor 42 
activity in vivo and in vitro. IRFl can also reverse the 43 
oncogenic transformation of cells induced by the overex- 44 
pression of both RAS and MYC in mouse models [1]. Since 45 
functional roles for RAS and MYC are established in human 46 
breast cancer [2, 3], a loss of IRFl function might be 47 
important in this disease. Functionally, overexpression of 48 


IRFl reduces [4, 5], and a dominant negative IRFl con- 49 
struct increases [4], tumorigenesis of human breast cancer 50 
xenografts. We and others have identified IRFl as a key 51 
regulator of breast cancer cell survival [5-8] that can acti- 52 
vate a caspase cascade [4] and induce apoptosis [5,6]. More 53 
specifically, the proapoptotic effects of IRFl include acti- 54 
vation/regulation of caspases-1 [9], -3 [4], -7 [4, 10], and 55 
-8 [4, 11]. IRFl can also induce apoptosis through both 56 
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7’P5J-dependent and 7P55-independent signaling [9, 12]. 
TP53 is often mutated in breast cancer [13], and many 
breast tumors initially respond to drugs and hormones 
through 7P55-dependent and IPJi-independent signaling. 
We have shown that IRFl is a key determinant of respon¬ 
siveness to antiestrogen therapies in breast cancer [6, 7]. 

Whether IRFl is a true tumor suppressor gene (TSG) in 
breast cancer is unknown. Established TSGs often show 
evidence of homozygous or heterozygous gene loss. For 
instance, while loss of BRCAl function in inherited breast 
cancers is usually a consequence of gene mutation(s), loss 
of BRCAl expression in sporadic breast cancers is often the 
result of loss of heterozygosity (LOH) accompanied by 
hypermethylation of a CpG island in the 5' region close to 
the transcription start site of the remaining allele [14, 15]. 
IRFl has been implicated as a putative TSG in leukemias 
and preleukemic myelodysplasias, and IRFl is either 
mutated or rearranged in some hematopoietic disorders 
[16]. IRFl was shown to be the true target of frequent 
deletions (LOH) in esophageal cancer [17] and gastric 
cancer [18]. IRFl is located at 5q31.1, a region shown to be 
commonly lost in two large studies evaluating breast 
tumors by chromosomal comparative genomic hybridiza¬ 
tion (CGH). Deletion of 5ql2-31 is detected in 11% of 
sporadic breast cancers [19] and 5q deletion is seen in 86% 
of BRCAl tumors [20]. More recently, a high-resolution 
array-CGH study has shown loss at 5q31.1 in 50% of 
fiPCAf-positive breast cancers [21]. Whether loss at the 
IRFl locus is the driver in these cancers and whether IRFl 
gene loss occurs in sporadic breast cancers are unknown. 

IRFl is inactivated by a point mutation in gastric cancer, 
suggesting that the loss of function of IRFl may be critical 
for the development of this disease [18]. When attempting 
to generate an IRFl riboprobe from MCF-7 breast cancer 
cells mRNA, we found a single nucleotide polymorphism 
(SNP) in the IRFl coding region [22] and a novel IRFl 
splice variant (K. B. Bouker; unpublished observations). 
The IRFl A4396G SNP is more frequent in human breast 
cancer cell lines than in the general population and is more 
frequently expressed in African-American than Caucasian 
subjects [22]. It is not known whether IRFl A4396G 
contributes to the earlier age [23] or higher stage at diag¬ 
nosis [24] or the lower overall survival rate of African- 
American compared with non-Hispanic white and Hispanic 
women [25]. When considered together, these observations 
strongly suggest that IRFl may be a TSG in breast cancer. 
Since one of the key features of TSGs is somatic loss, we 
designed this study to determine the incidence of IRFl loss 
in a series of 52 invasive breast tumors. Considering that 
IRFl LOH might be expected to reduce mRNA expression, 
we also explored whether low IRFl mRNA expression is 
associated with poor clinical outcomes in breast cancer 
patients. 


Methods 

We determined IRFl loss by LOH in breast tissue speci¬ 
mens from 52 patients with sporadic breast cancer obtained 
from the tumor bank at the Lombardi Comprehensive 
Cancer Center (LCCC). In each case, a paraffin block with 
breast tumor tissues and a second block with normal tissues 
(skin, negative lymph node, or a normal breast tissue 
block) were identified. An H&E-stained slide from each 
block was evaluated by a breast pathologist to confirm the 
diagnosis and mark the areas with malignant tissue or 
normal tissue. A 100-pm consecutive section was obtained 
from each block, and the tissues of interest were grossly 
microdissected with a razor blade to isolate malignant 
cells. Corresponding normal cells from a different block 
were obtained for each case. DNA was extracted from the 
tissue using the DNeasy QIAGEN kit according to manu¬ 
facturer’s instructions (QIAGEN Inc. Valencia, California). 

To study LOH at the IRFl locus, we selected an intra¬ 
genic, dinucleotide, polymorphic marker {IRFl Dinucleo¬ 
tide Repeat, Allele Set GDB: 211036), with a high degree 
of heterozygosity (74% heterozygosity). The sequences of 
the oligonucleotide primers were obtained from the Gen¬ 
ome Data Base (GDB) (http://www.gdb.org): Forward: 
5'-ATGGCAGATAGGTCCACCGG-37Reverse: 5'-TCAT 
CCTCATCTGTTGTACG-3'. Primers were fiuorescently 
labeled and PCR amplification was performed using a 
standard protocol. Allele sizes were determined by elec¬ 
trophoresis of PCR products in 6% denaturing polyacryl¬ 
amide gels and were compared to ROX 500 size standards 
(Applied Biosystems, Foster City, CA), using an automated 
sequencer (ABI 377), according to the manufacturer’s 
instructions (Applied Biosystems). Fluorescent signals 
from the different size alleles were recorded and analyzed 
using GENOTYPER version 2.1 and GENESCAN version 
3.1 software (Applied Biosystems). Following visual 
examination of computer printouts by two independent 
observers, LOH was determined mathematically according 
to the Genotyper User’s Manual (Applied Biosystems). 

The publicly available ONCOMINE cancer gene 
expression microarray database [www.oncomine.org; 26] 
was used to search for relationships between IRFl mRNA 
expression and outcomes in breast cancer clinical studies. 
Normalized Affymetrix MAS 5.0 gene expression data, 
originally published by Wang et al. [27] and Desmedt et al. 
[28], were downloaded from ONCOMINE. Statistical 
analysis was done using SigmaStat (Systat Software, Inc., 
San Jose, CA) and S-PLUS (Insightful, Seattle, WA). 
Median IRFl expression values across all samples, and 
between the top and bottom quartiles of expression, were 
compared by Mann-Whitney Rank-Sum test, and odds 
ratios were calculated for the association of low IRFl 
expression with poor outcomes following logistic 
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regression analysis. Statistical significance was defined as 
>95% confidence interval, or a = 0.05. 


Results 

In this study, 37 cases (71%) were informative for the IRFl 
dinucleotide marker used for LOH analysis. LOH was 
detected in 12 of these informative cases (12/37; 32%). 
Figure 1 shows a representative case with no LOH 
(Fig. la, bottom panel) and another representative case 
with LOH (Fig. lb, bottom panel). A significant correlation 
was found between LOH at the IRFl locus and both older 
age (P value = 0.0167 based on a two sample t-test) and 
earlier stage (Stages 1 and 2) (P value = 0.0165 based on 
Fisher’s exact test). 

The data from two published gene expression micro- 
array datasets were assessed to investigate mRNA 
expression and clinical outcome [27, 28]. In both studies, 
median IRFl mRNA expression, as detected by Affymetrix 
U133A GeneChips, was signihcantly reduced in patients 
with a worse outcome (recurrence, P = 0.004; death, 
P = 0.021). Moreover, when these expression values data 
were separated into quartiles and analyzed by logistic 
regression (Table 1), low IRFl mRNA expression (first 
compared with fourth quartile) was significantly associated 
with both recurrence (OR = 3.00, P = 0.003) and death 
(OR = 4.18, P = 0.004). 

Discussion 

In our study, a significant correlation was found between 
LOH at the IRFl locus and both older age (P = 0.0167) 
and earlier stage (Stages 1 and 2) (P = 0.0165). An inverse 
association between IRFl protein expression and tumor 
grade has been reported [29], and IRFl protein levels are 
lower in breast tumors than in adjacent normal cells [30]. 
However, subcellular location also affects IRFl correlation 
with clinical measures, and these correlations would not be 

Fig. 1 LOH analysis at the 
IRFl locus of two 
representative breast cancer 
cases, a A case with no LOH. 
b A case with LOH. (In a, b, 

Top panels shows the analysis 
performed in normal cells and 
bottom panels the analysis 
performed in tumor cells.) 


apparent with LOH measurements. For example, cytosolic 
IRF-1 protein (but not nuclear IRFl) is associated with age 
(similar to LOH hndings) and ER expression, whereas 
nuclear IRFl protein expression (but not cytosolic IRFl) 
correlates with PgR expression [31]. 

Since IRFl LOH might be expected to reduce mRNA 
expression, we explored the association of IRFl mRNA 
with the key measures of recurrence status (recurrent; non¬ 
recurrent) and vital status (alive; dead). Using the publicly 
available ONCOMINE cancer gene expression microarray 
database [26], we searched for relationships between IRFl 
mRNA expression and outcomes from two breast cancer 
clinical studies. The Wang et al. study includes 273 women 
diagnosed with lymph node-negative breast cancer who 
had not received systemic adjuvant therapy, but whose 
tumors are representative of a wide range of clinical/ 
pathological features (including age, stage, and tumor size). 
The original goal of that study was to identify a gene 
expression signature that could predict recurrence of met¬ 
astatic breast cancer in women with node-negative disease 
[27]. The more recent Desmedt et al. study is an inde¬ 
pendent validation of Wang et al. and includes 191 
untreated patients less than 61 years of age with node¬ 
negative, T1 and T2 breast cancer [28]. The primary 
endpoint for the study was overall survival, and median 
follow-up time was 13.6 years. While IRFl gene copy 
number data and protein expression data were not available 
from these studies, we could assess the potential of IRFl to 
act as a TSG as predicted by the likelihood that low IRFl 
mRNA expression was associated with poor clinical out¬ 
comes. We have observed that low IRFl mRNA expression 
is strongly correlated with both risk of recurrence and risk 
of death. Studies to determine directly the role of IRFl 
LOH in these outcomes are in progress. 

IRFl LOH in breast cancer may reflect haploinsuffi- 
ciency; over 35 TSGs are known in which haploinsuffi- 
ciency accounts for their contribution to carcinogenesis 
(though not necessarily breast carcinogenesis) [32, 33]. 
Reduced IRFl activity in experimental breast cancer mod¬ 
els is functionally associated with increased cell survival. 
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Table 1 IRFl mRNA 

expression and clinical outcomes 




Study 

Endpoint 

IRFl (median) 

IRFl (quartile) 

Odds ratio“ 

Wang et al. 

No disease (n = 180) 

P = 0.004 

First = 0.971 

OR = 3.00 





Cl = 1.44-6.27 


Recurrence (n = 93) 


Fourth = 0.455 

P = 0.003 

Desmedt et al. 

Alive (n = 135) 

P = 0.021 

First = 0.905 

OR = 4.18 





Cl = 1.56-11.21 


Dead (n = 56) 


Fourth = 0.355 

P = 0.004 


“ Odds ratios were calculated for the first (highest) versus fourth (lowest) quartiles of expression data, and denote the association of low IRFl 
mRNA expression with poor outcome (recurrence or death); Cl = 95% confidence interval 


reduced caspase activation, and apoptosis (4-6). While 
LOH may not be the only contributor to low IRFl expres¬ 
sion [34, 35], the strong association of low IRFl mRNA and 
recurrence status and survival imply an important TSG role 
for IRFl in many sporadic breast cancers. 
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